AINA Motion Content Review Room

01 · Pipeline

The process is a gated factory.

The repo is organized around source artifacts first: curriculum lesson markdown, briefs, script manifests, TTS sidecars, deterministic HTML compositions, validation reports, then generated media. Dry visual renders prove motion shape; final media waits for narrated audio plus manual QC.

01Authored lesson markdown stays imported/non-canonical until promoted.

02Brief JSON defines capability, role context, objective, and target duration.

03aina-lesson-script-writer emits six-beat script JSON.

04aina-tts-voice sends one Gemini request per beat.

05aina-hyperframes-composer emits deterministic HTML and manifest SHA.

06RenderEvaluator, HyperFrames lint/inspect, captions, gallery, and platform fit run.

07Ali watches final narrated videos and records learner/platform/marketing QC.

02 · Trio Skills

Three skills, one contract.

The plugin scaffold bundles the same trio that lives canonically under .agents/skills/. Each skill has a narrow job and explicit things it does not do, which keeps prompt writing, audio generation, and composition rendering from blurring into each other.

Script Writer

Brief to script manifest

Converts a capability brief into JSON that validates against script-manifest.schema.json. The output is narration, captions, visual intent, voice pinning, and renderer targets.

Six-beat shape: hook, context, example, counter-example, principle, takeaway.
Brand voice: calm, direct, slightly aphoristic; no learner-facing tool names.
Voice defaults: L1 uses Charon at roughly 1.03x.

TTS Voice

Script to Gemini audio

Wraps tools/lesson-video-renderer/tts/gemini-tts.ts. It strips AINA bracket markup, calls Gemini once per beat, writes WAVs, and reports actual durations.

One WAV per beat plus tts_main.wav and word_timings.json.
Default model is gemini-2.5-flash-tts through Vertex ADC.
It does not write captions, transcribe, normalize loudness, or change voices.

Composer

Audio timing to HyperFrames HTML

Consumes the validated script and TTS outputs, picks v1 visual primitives, and emits index.html plus composition-manifest.json.

Components: ConceptCard, SplitPanel, TimelineBar, ResponseBubble, Annotation.
Creates clip starts/durations from actual TTS timing sidecars.
Embeds assets, render plan, content index, Anime.js bridge, and SHA proofs.

03 · Gemini

The actual TTS prompts are tiny.

The model-facing payload is intentionally narrow. The script writer does the pedagogical work before TTS; Gemini receives the post-markup beat transcript wrapped in a concise direction to synthesize audio only.

tools/lesson-video-renderer/tts/gemini-tts.tsprimary prompt

function directedTtsPrompt(text, paceFactor) {
  const pace =
    paceFactor > 1.04 ? "slightly brisk" :
    paceFactor < 0.98 ? "slightly slower than conversational" :
    "measured";

  return `Synthesize speech only. Do not read these directions aloud. Use an informative, calm voice at a ${pace} pace.

TRANSCRIPT:
${text}`;
}

tools/lesson-video-renderer/tts/gemini-tts.tsretry prompt

function retryDirectedTtsPrompt(text, paceFactor) {
  const pace =
    paceFactor > 1.04 ? "slightly brisk" :
    paceFactor < 0.98 ? "slightly slower than conversational" :
    "measured";

  return `Create audio only for the transcript below. Do not add commentary, do not describe the task, and do not read the labels. Voice style: informative, calm, ${pace}.

BEGIN SPOKEN TRANSCRIPT
${text}
END SPOKEN TRANSCRIPT`;
}

Request shape

Vertex ADC generateContent payload

The endpoint is https://aiplatform.googleapis.com/v1beta1/projects/<project>/locations/<location>/publishers/google/models/<model>:generateContent. Auth uses an ADC bearer token and x-goog-user-project.

{
  "contents": {
    "role": "user",
    "parts": { "text": "<directed TTS prompt>" }
  },
  "generation_config": {
    "response_modalities": ["AUDIO"],
    "speech_config": {
      "voice_config": {
        "prebuilt_voice_config": {
          "voice_name": "Charon"
        }
      }
    }
  }
}

Example transcript

L1.1 beat B1 after markup expansion

The text below is the payload inserted into the prompt body for one beat. Pauses and emphasis markers are stripped or converted before the model call.

Most weak outputs begin before the answer. The person asking has not named the work.

The batch freezes 60 requests: 10 videos times 6 beats. The voice is Charon for Level 1.

04 · HyperFrames

The render prompt is a composition contract.

HyperFrames is not asked to invent a lesson. The composer emits a deterministic HTML timeline from the script and real audio timing. The CLI renders that folder frame-by-frame into MP4 only after the evaluator gate passes.

Renderer path

The orchestrator wires brief to MP4.

tools/lesson-video-renderer/render.sh stages: script generation, Gemini TTS, HyperFrames composition, RenderEvaluator, then optional HyperFrames render.

tools/lesson-video-renderer/render.sh <brief.json> \
  --script-manifest content/scripts/L1-cap-1-1-primary-v3.script.json \
  --aspect 16:9 \
  --reuse-tts

Script prompt path

Default script-generation command

When a reviewed manifest is not supplied, Stage 1 invokes the skill runner with the brief on stdin.

claude --skill aina-lesson-script-writer --output-format json \
  < brief.json > renders/<video_id>/script.json

Composition internals

Root: <main id="timeline" data-fps="30">.
One section.clip per beat.
data-start and data-duration are cumulative TTS timing values.
manifest_sha is SHA-256 over render plan, assets, and clip index.

Render plan specimen

{
  "preset": "horizontal-1080p",
  "resolution": { "w": 1920, "h": 1080 },
  "fps": 30,
  "aspect": "16:9",
  "caption": {
    "font": "Inter",
    "size": 42,
    "lineHeight": 1.2,
    "safeMargin": 48,
    "maxWidthPct": 0.8,
    "position": "bottom-edge"
  },
  "animejs": {
    "runtime": "injected",
    "selector": ".clip",
    "driver": "data-keyframes",
    "deterministic": true
  },
  "audio": {
    "tts_provider": "gemini",
    "tts_model": "gemini-2.5-flash-preview-tts",
    "tts_voice": "Charon",
    "tts_track": "./audio/tts_main.wav"
  }
}

05 · Curriculum

Level 1 is mapped end to end.

Curriculum alignment passed against the authored Level 1 lesson markdown: titles, objectives, source status, common mistakes, six-beat structure, counter-example, final learner rule, embed packet linkage, and marketing-proof note.

Capability	Source Lesson	Video Script	Final Learner Rule	Motion Motif	Primary Visual Move
1.1	Structure a Better AI Request	Brief Before You Ask	Before real work, write the brief first.	Brief builder	Type a vague prompt, then reveal role/context/task/constraints/output as locked blocks.
1.2	Break the Monolith	Break The Monolith	Split large work before you prompt.	Ordered breakdown	Split one large ask into 3-6 steps with dependency arrows and verification gates.
1.3	What It Must Not Do	Ban What Breaks It	Write one explicit ban first.	Boundary/fence	Show unwanted output patterns crossing out as explicit constraints appear.
1.4	Name the Hidden Things	Name Hidden Things	Name the hidden things before analysis.	Hidden variable reveal	Surface unstated definitions, data limits, and context assumptions beneath a prompt.
1.5	Read AI Like a Senior Reviewer	Read Like A Reviewer	Scan before anyone relies on it.	Reviewer scan	Sweep three lenses over an AI output: alignment, consistency, nuance.
1.6	Edit Forward, Don't Restart	Edit Forward	Edit forward before restarting.	Edit-forward loop	Animate draft, targeted follow-up, preserved part, and changed part as a diff.
1.7	Direct the AI's Stance	Direct The Stance	Name the stance before judgment.	Stance switch	Same input produces two side-by-side professional stances with visible contrast.
1.8	Know What the Model Can't Know	Flag The Unknowns	Map the gap before you ask.	Known/unknown boundary	Mark claims as known, stale, proprietary, or verify-needed.
1.9	The Second Time You Do It	Make It Reusable	Template it before the third use.	Template extraction	Convert repeated one-off prompts into a reusable template with slots.
1.10	Single Prompt, Conversation, or Workflow?	Choose The Scope	Classify the task before prompting.	Scope classifier	Classify tasks into single prompt, conversation, or workflow using a three-way switchboard.

06 · Current Gate

The trio has audio, but not timing approval.

Real validation-trio narration was generated, but preflight blocks reuse because the spoken tracks run roughly 30 percent longer than the 60-second target. The next safe move is script shortening or pace adjustment, then regenerate trio TTS intentionally.

blocked

30.2%

L1.1 · Brief Before You Ask

Actual TTS duration: 78.106s against a 60s target. Evidence: renders/L1-cap-1-1-primary-v3/audio/tts_summary.json.

blocked

33.2%

L1.3 · Ban What Breaks It

Actual TTS duration: 79.946s against a 60s target. Evidence: renders/L1-cap-1-3-primary-v2/audio/tts_summary.json.

blocked

29.9%

L1.5 · Read Like A Reviewer

Actual TTS duration: 77.946s against a 60s target. Evidence: renders/L1-cap-1-5-primary-v2/audio/tts_summary.json.

Next command ladder

Shorten or pace-adjust validation trio scripts.
Run pnpm video:l1:vertex-adc-check.
Run pnpm video:l1:produce-final --phase trio --audio-only --regenerate-tts.
Run pnpm video:l1:preflight --reuse-tts --validation-trio.
Render trio with pnpm video:l1:produce-final --phase trio --reuse-tts.

Manual QC still required

Learner QC: can a learner state and apply the capability after one watch?
Platform-fit QC: does the video sit naturally in the Explainer Card slot?
Marketing-proof QC: does it demonstrate AINA's thinking practice?
Approval requires reviewer and reviewed timestamp via pnpm video:l1:qc-review:record.

07 · Evidence

Source paths for review and handoff.

This page is an internal visual explainer, not a canonical source. The paths below are the repo-ground-truth artifacts used to build it.

Primary source paths

.agents/skills/aina-lesson-script-writer/SKILL.md

.agents/skills/aina-tts-voice/SKILL.md

.agents/skills/aina-hyperframes-composer/SKILL.md

plugins/aina-video-pipeline/README.md

tools/lesson-video-renderer/tts/gemini-tts.ts

tools/lesson-video-renderer/composer/compose.ts

tools/lesson-video-renderer/render.sh

tools/run-level1-final-production.mjs

content/video-batches/l1-authored-hyperframes-2026-05-18.json

content/platform-manifests/l1-tts-request-manifest-2026-05-18.json

content/platform-manifests/l1-curriculum-alignment-validation-2026-05-18.json

content/platform-manifests/l1-final-render-operator-packet-2026-05-18.json

Review prompt: what Ali should inspect first

Does the three-skill split match how you want future agents to think about the video factory?
Are the Gemini TTS prompts too minimal, or should they carry stricter pronunciation and duration control language?
Should the validation trio remain L1.1, L1.3, L1.5, or should one slot change before manual QC?
Does the curriculum matrix feel like credible AINA pedagogy, not generic prompt advice?
Should final review prioritize shortening narration or changing the render target duration?