Raw TikTok data arrives
Apify/TikTok sync stores caption, hashtags, duration, sound, cover, web video URL, stats, carousel flag, subtitles, slideshow image links, and raw metadata.
How a TikTok moves from raw sync data into format, relevance, content label, production owner, and recreation score for Oncourse.
The current code runs enrichment in a shared helper used by automated sync, manual classification, single-account sync, and backfill scripts.
Apify/TikTok sync stores caption, hashtags, duration, sound, cover, web video URL, stats, carousel flag, subtitles, slideshow image links, and raw metadata.
When possible, video and cover assets are copied to storage. Hosted video is sent to Gemini; otherwise the thumbnail is used as visual context.
Deterministic signals run first: slideshow/photo mode becomes carousel slideshow; subtitles become voiceover. If still unknown, Gemini does legacy format analysis.
Gemini gets the Oncourse app context, video metadata, format hint, and media. It returns relevance, label, owner, score, reasons, hook, script, CTA, and remake plan.
Relevant videos keep labels, score, reasons, and proposal. Irrelevant videos keep a relevance reason but clear content/production labels and use score 0.
This is the heart of the system. The model is allowed to say yes when the idea can credibly become Oncourse content, but it must not invent a med-prep bridge.
The score is 0-100. Gemini can return an explicit score; otherwise code computes it from five 0-20 rubric values.
audienceFit + featureFit + adaptationEase + (20 - executionRisk) + (20 - creatorDependence)
These labels help the team decide how to remake or learn from a post.
CAROUSEL_SLIDESHOW, UGC_VOICEOVER, UGC_REACTION, or OTHER. Current heuristic only guarantees slideshow and subtitle voiceover; reaction/other may come from Gemini fallback.
REACTION, RESOURCES, DAY_IN_LIFE, EXPLAINER, SKIT, or OTHER. Only saved when the post is relevant.
TEAM_RECREATE for resource roundups, study-system explainers, slideshows, and feature-led ideas. CREATOR_RECREATE for day-in-life, skits, face-led stories, and personality-led reactions.
These are the highest-leverage prompt knobs. Changing these shifts what gets marked relevant and how recreate ideas are ranked.
Add examples of content you want the system to keep: specific student pains, creator formats, resource angles, or study-system hooks.
Add examples of false positives: vague motivation, med identity without payoff, lifestyle posts, or trends that do not help content strategy.
Push toward easier team-made posts by penalizing creator dependence and execution risk more strongly in the prompt language.
Open these to review the exact decision language your content team can edit or comment on.
Used for relevance, content label, production label, scoring, hook/script/CTA, and recreate proposal.
Used only when deterministic format signals do not produce a format or forced analysis asks Gemini to classify format.