AINA Motion Content · Founder Review

Video factory review room.

A one-page visual dossier for the Level 1 HyperFrames plus Gemini TTS pipeline: the three repo-local skills, the exact TTS prompt wrappers, the HyperFrames render contract, the process ladder, and the curriculum batch that feeds it.

01 · Pipeline

The process is a gated factory.

The repo is organized around source artifacts first: curriculum lesson markdown, briefs, script manifests, TTS sidecars, deterministic HTML compositions, validation reports, then generated media. Dry visual renders prove motion shape; final media waits for narrated audio plus manual QC.

01Authored lesson markdown stays imported/non-canonical until promoted.
02Brief JSON defines capability, role context, objective, and target duration.
03aina-lesson-script-writer emits six-beat script JSON.
04aina-tts-voice sends one Gemini request per beat.
05aina-hyperframes-composer emits deterministic HTML and manifest SHA.
06RenderEvaluator, HyperFrames lint/inspect, captions, gallery, and platform fit run.
07Ali watches final narrated videos and records learner/platform/marketing QC.
02 · Trio Skills

Three skills, one contract.

The plugin scaffold bundles the same trio that lives canonically under .agents/skills/. Each skill has a narrow job and explicit things it does not do, which keeps prompt writing, audio generation, and composition rendering from blurring into each other.

Script Writer

Brief to script manifest

Converts a capability brief into JSON that validates against script-manifest.schema.json. The output is narration, captions, visual intent, voice pinning, and renderer targets.

  • Six-beat shape: hook, context, example, counter-example, principle, takeaway.
  • Brand voice: calm, direct, slightly aphoristic; no learner-facing tool names.
  • Voice defaults: L1 uses Charon at roughly 1.03x.
TTS Voice

Script to Gemini audio

Wraps tools/lesson-video-renderer/tts/gemini-tts.ts. It strips AINA bracket markup, calls Gemini once per beat, writes WAVs, and reports actual durations.

  • One WAV per beat plus tts_main.wav and word_timings.json.
  • Default model is gemini-2.5-flash-tts through Vertex ADC.
  • It does not write captions, transcribe, normalize loudness, or change voices.
Composer

Audio timing to HyperFrames HTML

Consumes the validated script and TTS outputs, picks v1 visual primitives, and emits index.html plus composition-manifest.json.

  • Components: ConceptCard, SplitPanel, TimelineBar, ResponseBubble, Annotation.
  • Creates clip starts/durations from actual TTS timing sidecars.
  • Embeds assets, render plan, content index, Anime.js bridge, and SHA proofs.
03 · Gemini

The actual TTS prompts are tiny.

The model-facing payload is intentionally narrow. The script writer does the pedagogical work before TTS; Gemini receives the post-markup beat transcript wrapped in a concise direction to synthesize audio only.

tools/lesson-video-renderer/tts/gemini-tts.tsprimary prompt
function directedTtsPrompt(text, paceFactor) {
  const pace =
    paceFactor > 1.04 ? "slightly brisk" :
    paceFactor < 0.98 ? "slightly slower than conversational" :
    "measured";

  return `Synthesize speech only. Do not read these directions aloud. Use an informative, calm voice at a ${pace} pace.

TRANSCRIPT:
${text}`;
}
tools/lesson-video-renderer/tts/gemini-tts.tsretry prompt
function retryDirectedTtsPrompt(text, paceFactor) {
  const pace =
    paceFactor > 1.04 ? "slightly brisk" :
    paceFactor < 0.98 ? "slightly slower than conversational" :
    "measured";

  return `Create audio only for the transcript below. Do not add commentary, do not describe the task, and do not read the labels. Voice style: informative, calm, ${pace}.

BEGIN SPOKEN TRANSCRIPT
${text}
END SPOKEN TRANSCRIPT`;
}
Request shape

Vertex ADC generateContent payload

The endpoint is https://aiplatform.googleapis.com/v1beta1/projects/<project>/locations/<location>/publishers/google/models/<model>:generateContent. Auth uses an ADC bearer token and x-goog-user-project.

{
  "contents": {
    "role": "user",
    "parts": { "text": "<directed TTS prompt>" }
  },
  "generation_config": {
    "response_modalities": ["AUDIO"],
    "speech_config": {
      "voice_config": {
        "prebuilt_voice_config": {
          "voice_name": "Charon"
        }
      }
    }
  }
}
Example transcript

L1.1 beat B1 after markup expansion

The text below is the payload inserted into the prompt body for one beat. Pauses and emphasis markers are stripped or converted before the model call.

Most weak outputs begin before the answer. The person asking has not named the work.

The batch freezes 60 requests: 10 videos times 6 beats. The voice is Charon for Level 1.

04 · HyperFrames

The render prompt is a composition contract.

HyperFrames is not asked to invent a lesson. The composer emits a deterministic HTML timeline from the script and real audio timing. The CLI renders that folder frame-by-frame into MP4 only after the evaluator gate passes.

Renderer path

The orchestrator wires brief to MP4.

tools/lesson-video-renderer/render.sh stages: script generation, Gemini TTS, HyperFrames composition, RenderEvaluator, then optional HyperFrames render.

tools/lesson-video-renderer/render.sh <brief.json> \
  --script-manifest content/scripts/L1-cap-1-1-primary-v3.script.json \
  --aspect 16:9 \
  --reuse-tts
Script prompt path

Default script-generation command

When a reviewed manifest is not supplied, Stage 1 invokes the skill runner with the brief on stdin.

claude --skill aina-lesson-script-writer --output-format json \
  < brief.json > renders/<video_id>/script.json
Composition internals
  • Root: <main id="timeline" data-fps="30">.
  • One section.clip per beat.
  • data-start and data-duration are cumulative TTS timing values.
  • manifest_sha is SHA-256 over render plan, assets, and clip index.
Render plan specimen
{
  "preset": "horizontal-1080p",
  "resolution": { "w": 1920, "h": 1080 },
  "fps": 30,
  "aspect": "16:9",
  "caption": {
    "font": "Inter",
    "size": 42,
    "lineHeight": 1.2,
    "safeMargin": 48,
    "maxWidthPct": 0.8,
    "position": "bottom-edge"
  },
  "animejs": {
    "runtime": "injected",
    "selector": ".clip",
    "driver": "data-keyframes",
    "deterministic": true
  },
  "audio": {
    "tts_provider": "gemini",
    "tts_model": "gemini-2.5-flash-preview-tts",
    "tts_voice": "Charon",
    "tts_track": "./audio/tts_main.wav"
  }
}
05 · Curriculum

Level 1 is mapped end to end.

Curriculum alignment passed against the authored Level 1 lesson markdown: titles, objectives, source status, common mistakes, six-beat structure, counter-example, final learner rule, embed packet linkage, and marketing-proof note.

Capability Source Lesson Video Script Final Learner Rule Motion Motif Primary Visual Move
1.1Structure a Better AI RequestBrief Before You AskBefore real work, write the brief first.Brief builderType a vague prompt, then reveal role/context/task/constraints/output as locked blocks.
1.2Break the MonolithBreak The MonolithSplit large work before you prompt.Ordered breakdownSplit one large ask into 3-6 steps with dependency arrows and verification gates.
1.3What It Must Not DoBan What Breaks ItWrite one explicit ban first.Boundary/fenceShow unwanted output patterns crossing out as explicit constraints appear.
1.4Name the Hidden ThingsName Hidden ThingsName the hidden things before analysis.Hidden variable revealSurface unstated definitions, data limits, and context assumptions beneath a prompt.
1.5Read AI Like a Senior ReviewerRead Like A ReviewerScan before anyone relies on it.Reviewer scanSweep three lenses over an AI output: alignment, consistency, nuance.
1.6Edit Forward, Don't RestartEdit ForwardEdit forward before restarting.Edit-forward loopAnimate draft, targeted follow-up, preserved part, and changed part as a diff.
1.7Direct the AI's StanceDirect The StanceName the stance before judgment.Stance switchSame input produces two side-by-side professional stances with visible contrast.
1.8Know What the Model Can't KnowFlag The UnknownsMap the gap before you ask.Known/unknown boundaryMark claims as known, stale, proprietary, or verify-needed.
1.9The Second Time You Do ItMake It ReusableTemplate it before the third use.Template extractionConvert repeated one-off prompts into a reusable template with slots.
1.10Single Prompt, Conversation, or Workflow?Choose The ScopeClassify the task before prompting.Scope classifierClassify tasks into single prompt, conversation, or workflow using a three-way switchboard.
06 · Current Gate

The trio has audio, but not timing approval.

Real validation-trio narration was generated, but preflight blocks reuse because the spoken tracks run roughly 30 percent longer than the 60-second target. The next safe move is script shortening or pace adjustment, then regenerate trio TTS intentionally.

blocked
30.2%

L1.1 · Brief Before You Ask

Actual TTS duration: 78.106s against a 60s target. Evidence: renders/L1-cap-1-1-primary-v3/audio/tts_summary.json.

blocked
33.2%

L1.3 · Ban What Breaks It

Actual TTS duration: 79.946s against a 60s target. Evidence: renders/L1-cap-1-3-primary-v2/audio/tts_summary.json.

blocked
29.9%

L1.5 · Read Like A Reviewer

Actual TTS duration: 77.946s against a 60s target. Evidence: renders/L1-cap-1-5-primary-v2/audio/tts_summary.json.

Next command ladder
  • Shorten or pace-adjust validation trio scripts.
  • Run pnpm video:l1:vertex-adc-check.
  • Run pnpm video:l1:produce-final --phase trio --audio-only --regenerate-tts.
  • Run pnpm video:l1:preflight --reuse-tts --validation-trio.
  • Render trio with pnpm video:l1:produce-final --phase trio --reuse-tts.
Manual QC still required
  • Learner QC: can a learner state and apply the capability after one watch?
  • Platform-fit QC: does the video sit naturally in the Explainer Card slot?
  • Marketing-proof QC: does it demonstrate AINA's thinking practice?
  • Approval requires reviewer and reviewed timestamp via pnpm video:l1:qc-review:record.
07 · Evidence

Source paths for review and handoff.

This page is an internal visual explainer, not a canonical source. The paths below are the repo-ground-truth artifacts used to build it.

Primary source paths
.agents/skills/aina-lesson-script-writer/SKILL.md
.agents/skills/aina-tts-voice/SKILL.md
.agents/skills/aina-hyperframes-composer/SKILL.md
plugins/aina-video-pipeline/README.md
tools/lesson-video-renderer/tts/gemini-tts.ts
tools/lesson-video-renderer/composer/compose.ts
tools/lesson-video-renderer/render.sh
tools/run-level1-final-production.mjs
content/video-batches/l1-authored-hyperframes-2026-05-18.json
content/platform-manifests/l1-tts-request-manifest-2026-05-18.json
content/platform-manifests/l1-curriculum-alignment-validation-2026-05-18.json
content/platform-manifests/l1-final-render-operator-packet-2026-05-18.json
Review prompt: what Ali should inspect first
  • Does the three-skill split match how you want future agents to think about the video factory?
  • Are the Gemini TTS prompts too minimal, or should they carry stricter pronunciation and duration control language?
  • Should the validation trio remain L1.1, L1.3, L1.5, or should one slot change before manual QC?
  • Does the curriculum matrix feel like credible AINA pedagogy, not generic prompt advice?
  • Should final review prioritize shortening narration or changing the render target duration?