▶ Watch first: Zero-Barrier Animation — First Steps in AI Filmmaking
▶ 先看视频:2026 AI 动画大师课——从白纸到 4K 故事
▶ ดูก่อน: แอนิเมชันไร้กำแพง — ก้าวแรกสู่การสร้างหนังด้วย AI
🧩 The big idea: a hybrid workflow
🧩 核心理念:混合工作流
🧩 แนวคิดหลัก: เวิร์กโฟลว์แบบผสม
No single button makes a finished cartoon. Professional-grade AI animation is a hybrid of three things: automated generative models (for images and motion), manual planning (scripts, shot lists, character sheets), and traditional editing (trimming, audio, upscaling). The planning is what keeps resolution high, motion fluid, and your characters looking like themselves from shot to shot.
没有哪个按钮能一键做出成品动画。专业级的 AI 动画是三者的混合:自动化生成模型(生成图像与动作)、人工策划(剧本、镜头清单、角色设定表)和传统剪辑(裁剪、配音、放大)。正是策划工作让分辨率保持清晰、动作流畅,并让角色在每个镜头中都不走样。
ไม่มีปุ่มไหนกดแล้วได้การ์ตูนสำเร็จรูป แอนิเมชัน AI ระดับมืออาชีพคือการผสมสามอย่าง: โมเดลสร้างภาพอัตโนมัติ (สร้างภาพและการเคลื่อนไหว) การวางแผนด้วยมือ (บท รายการช็อต ชีตตัวละคร) และการตัดต่อแบบดั้งเดิม (ตัด เสียง อัปสเกล) การวางแผนนี่แหละที่ทำให้ความละเอียดคมชัด การเคลื่อนไหวลื่นไหล และตัวละครหน้าตาเหมือนเดิมในทุกช็อต
1 Concept, Scripting & Layout构思、剧本与布局แนวคิด บท และการจัดวาง
Scripting
编写剧本
เขียนบท
Define your story, then use a Large Language Model (LLM) to craft a detailed script. Describe lighting, camera angles, and character emotions — not just dialogue.
先确定你的故事,然后用大语言模型(LLM)写出详细的剧本。要描述灯光、镜头角度和角色情绪——不仅仅是台词。
กำหนดเรื่องราวของคุณ แล้วใช้โมเดลภาษาขนาดใหญ่ (LLM) ช่วยร่างบทอย่างละเอียด อธิบายแสง มุมกล้อง และอารมณ์ของตัวละคร — ไม่ใช่แค่บทพูด
Shot planning
镜头规划
วางแผนช็อต
Break the story into a beat list of very short shots, ~2–6 seconds each.
把故事拆成一个个很短的镜头,每个约 2–6 秒的镜头清单。
แบ่งเรื่องเป็นรายการช็อตสั้น ๆ ช็อตละประมาณ 2–6 วินาที
⚠️ Why short shots matter
⚠️ 为什么镜头要短
⚠️ ทำไมช็อตต้องสั้น
AI video models have poor "memory." Long clips let backgrounds drift and characters slowly stop looking like themselves. Short segments keep every shot locked and on-model. Think in shots — one frame, one motion idea — never "a whole scene at once."
AI 视频模型的"记忆"很差。长片段会让背景漂移、角色逐渐变样。短片段能让每个镜头都锁定、不跑偏。要"按镜头思考"——一帧、一个动作想法——绝不要"一次生成整场戏"。
โมเดลวิดีโอ AI มี "ความจำ" ไม่ดี คลิปยาวจะทำให้ฉากหลังเพี้ยนและตัวละครค่อย ๆ เปลี่ยนหน้าตา ช็อตสั้นช่วยล็อกทุกช็อตให้คงที่ คิดเป็นช็อต — หนึ่งเฟรม หนึ่งไอเดียการเคลื่อนไหว — อย่าสร้าง "ทั้งฉากในครั้งเดียว"
2 Character & Environment Design角色与场景设计ออกแบบตัวละครและฉาก
Token locking
锁定提示词(token)
ล็อกคำสั่ง (token)
Write a precise text description for each character (e.g. "sharp emerald-green eyes, almond shape") and reuse those exact prompt tokens verbatim in every generation. Never switch "emerald eyes" to "green eyes" — that causes visual drift.
为每个角色写精确的文字描述(例如"锐利的祖母绿色眼睛,杏仁形"),并在每一次生成中逐字重复使用同样的提示词 token。绝不要把"emerald eyes"换成"green eyes"——那会导致视觉漂移。
เขียนคำบรรยายตัวละครแต่ละตัวให้แม่นยำ (เช่น "ตาสีเขียวมรกตคมชัด รูปทรงอัลมอนด์") และใช้ token คำสั่งเดิมซ้ำคำต่อคำ ในทุกครั้งที่สร้าง อย่าเปลี่ยนจาก "emerald eyes" เป็น "green eyes" เพราะจะทำให้ภาพเพี้ยน
Character sheets
角色设定表
ชีตตัวละคร
Generate a reference sheet per character with ~6 poses (front, side, 3/4 view, full body, action, expression).
为每个角色生成一张参考设定表,包含约 6 个姿势(正面、侧面、3/4 视角、全身、动作、表情)。
สร้างชีตอ้างอิงสำหรับตัวละครแต่ละตัว ประมาณ 6 ท่า (หน้าตรง ด้านข้าง มุม 3/4 เต็มตัว ท่าแอ็กชัน และสีหน้า)
- For models like Nano Banana Pro, the sweet spot is 4–6 high-quality 1024×1024 reference images (front, 3/4 left, 3/4 right) — enough for the AI to "understand" the character's 3D facial structure and anchor it.
- 对于 Nano Banana Pro 这类模型,最佳数量是 4–6 张高质量的 1024×1024 参考图(正面、左 3/4、右 3/4)——足以让 AI "理解"角色头部的三维结构并锁定它。
- สำหรับโมเดลอย่าง Nano Banana Pro จำนวนที่เหมาะที่สุดคือภาพอ้างอิงคุณภาพสูง 1024×1024 จำนวน 4–6 รูป (หน้าตรง 3/4 ซ้าย 3/4 ขวา) — เพียงพอให้ AI "เข้าใจ" โครงสร้างใบหน้าสามมิติและล็อกไว้
- Lock the character once, then reuse that sheet as the reference image in every later shot.
- 先锁定角色一次,之后在每个镜头中都复用这张设定表作为参考图。
- ล็อกตัวละครครั้งเดียว แล้วใช้ชีตนี้เป็นภาพอ้างอิงในทุกช็อตถัด ๆ ไป
World building
场景构建
สร้างฉาก
List every location and generate empty environment reference images (no characters). Use the exact same art-style prompt as your characters so the worlds match perfectly.
列出每一个地点,并生成空场景参考图(不含角色)。使用与角色完全相同的美术风格提示词,让场景与角色完美匹配。
ระบุทุกสถานที่ แล้วสร้างภาพอ้างอิงฉากเปล่า (ไม่มีตัวละคร) ใช้คำสั่งสไตล์ศิลป์เดียวกันกับตัวละคร เพื่อให้ฉากเข้ากันสนิท
💡 One style clause, everywhere
💡 同一句风格提示,处处使用
💡 คำสั่งสไตล์เดียว ใช้ทุกที่
Decide your art style up front (e.g. "clean cel anime" or "painterly watercolor") and paste the same style sentence into every character, environment, and shot prompt. Consistency comes from repetition.
一开始就定好美术风格(例如"干净的赛璐璐动画"或"水彩画风"),并把同一句风格描述粘贴到每个角色、场景和镜头的提示词里。一致性来自重复。
ตัดสินใจเรื่องสไตล์ศิลป์ตั้งแต่ต้น (เช่น "อนิเมะเซลล์สะอาดตา" หรือ "สีน้ำ") แล้ววางประโยคสไตล์เดียวกันลงในคำสั่งของทุกตัวละคร ฉาก และช็อต ความคงเส้นคงวามาจากการทำซ้ำ
3 Stills & Motion Prompts静帧与动作提示ภาพนิ่งและคำสั่งการเคลื่อนไหว
Render the starting still
生成起始帧
สร้างเฟรมเริ่มต้น
For each shot, combine your style clause + exact character tokens + shot description + reference images to generate a single starting frame.
为每个镜头,把你的风格提示 + 精确的角色 token + 镜头描述 + 参考图组合起来,生成一张起始帧。
สำหรับแต่ละช็อต รวมคำสั่งสไตล์ + token ตัวละครที่แม่นยำ + คำบรรยายช็อต + ภาพอ้างอิง เพื่อสร้างเฟรมเริ่มต้นหนึ่งเฟรม
Write the motion prompt
编写动作提示
เขียนคำสั่งการเคลื่อนไหว
Separately, write what should move inside the frame for the animation step.
另外单独写出在动画阶段,画面内应该有什么在动。
เขียนแยกต่างหากว่าอะไรควรเคลื่อนไหวภายในเฟรมสำหรับขั้นตอนแอนิเมชัน
⚠️ Animate the scene, not the camera
⚠️ 让画面动起来,而不是让镜头动
⚠️ ทำให้ฉากเคลื่อนไหว ไม่ใช่กล้อง
Describe motion in the frame — "drifting smoke," "hair blowing," "leaves falling," "she turns her head." If you only write camera moves like "zoom in" or "push in," the model often just pans over a static picture — the fake "Ken Burns" effect, not real animation.
描述画面内的运动——"飘动的烟""被风吹动的头发""落下的叶子""她转头"。如果你只写"放大""推近"这类镜头运动,模型往往只是在静止画面上平移——也就是假的"肯·伯恩斯(Ken Burns)"效果,而不是真正的动画。
อธิบายการเคลื่อนไหวในเฟรม — "ควันลอย" "ผมปลิว" "ใบไม้ร่วง" "เธอหันหัว" ถ้าคุณเขียนแต่การเคลื่อนกล้องอย่าง "ซูมเข้า" หรือ "ดันเข้า" โมเดลมักจะแค่แพนผ่านภาพนิ่ง — เป็นเอฟเฟกต์ "Ken Burns" ปลอม ๆ ไม่ใช่แอนิเมชันจริง
4 Video Generation (Image-to-Video)视频生成(图生视频 i2v)สร้างวิดีโอ (ภาพเป็นวิดีโอ i2v)
Feed your stills + motion prompts into an image-to-video (i2v) model. The still becomes the first frame and the model animates it.
把你的静帧 + 动作提示送入图生视频(i2v)模型。静帧成为第一帧,模型让它动起来。
ป้อนภาพนิ่ง + คำสั่งการเคลื่อนไหวเข้าโมเดลภาพเป็นวิดีโอ (i2v) ภาพนิ่งกลายเป็นเฟรมแรก แล้วโมเดลทำให้มันเคลื่อนไหว
For hyper-realistic cinematic 4K, OpenAI's Sora is the gold standard; specialized 2D tools suit flat, explainer-style cartoons.
追求超写实的电影级 4K,OpenAI 的 Sora 是标杆;专门的 2D 工具更适合扁平、讲解风格的动画。
สำหรับ 4K ระดับภาพยนตร์สมจริงสุด ๆ Sora ของ OpenAI คือมาตรฐาน ส่วนเครื่องมือ 2D เฉพาะทางเหมาะกับการ์ตูนแบนสไตล์อธิบาย
Run Wan 2.2 through ComfyUI on your own machine — no per-clip cloud bill, nothing leaves your computer.
在自己的机器上通过 ComfyUI 运行 Wan 2.2——没有按片云费用,数据也不离开你的电脑。
รัน Wan 2.2 ผ่าน ComfyUI บนเครื่องของคุณเอง — ไม่มีค่าคลาวด์ต่อคลิป ข้อมูลไม่ออกจากเครื่อง
ComfyUI getting started →ComfyUI 入门 →เริ่มต้นใช้ ComfyUI →Video generation is resource-hungry. The DGX Spark's 128 GB unified memory lets large video models fit without running out of VRAM. Expect ~5–20 min per clip.
视频生成非常吃资源。DGX Spark 的 128 GB 统一内存能容纳大型视频模型而不会爆显存。每个片段大约需要 5–20 分钟。
การสร้างวิดีโอกินทรัพยากรมาก หน่วยความจำรวม 128 GB ของ DGX Spark ทำให้โมเดลวิดีโอใหญ่ ๆ ใส่ได้โดยไม่ล้น VRAM ราว 5–20 นาทีต่อคลิป
ComfyUI on DGX Spark →ComfyUI 在 DGX Spark 上 →ComfyUI บน DGX Spark →5 Refinement, Upscaling & Post-Production精修、放大与后期制作ปรับแต่ง อัปสเกล และโพสต์โปรดักชัน
Upscale & smooth
放大与平滑
อัปสเกลและทำให้เนียน
Raw AI output almost always needs a second pass. Apply 4K upscalers and temporal smoothing (features like "Flow Control" or "Motion Buckets") to fix flickering and bring motion up to streaming quality.
AI 的原始输出几乎总需要再处理一遍。用 4K 放大器和时间平滑工具(如"Flow Control"或"Motion Buckets")来消除闪烁,把动作提升到流媒体级别。
ผลลัพธ์ดิบจาก AI มักต้องผ่านอีกรอบ ใช้ตัวอัปสเกล 4Kและเครื่องมือทำให้เนียนตามเวลา (เช่น "Flow Control" หรือ "Motion Buckets") เพื่อแก้การกะพริบและยกระดับการเคลื่อนไหวให้ถึงมาตรฐานสตรีมมิง
Assemble
合成
ประกอบรวม
- Import your clips into a normal video editor.
- 把片段导入普通的视频剪辑软件。
- นำคลิปเข้าโปรแกรมตัดต่อวิดีโอทั่วไป
- Generate narration audio, and trim each clip to match the exact length of the voiceover.
- 生成旁白音频,并把每个片段裁剪到与配音完全相同的长度。
- สร้างเสียงบรรยาย และตัดแต่ละคลิปให้ยาวเท่ากับเสียงพากย์พอดี
- Add sound effects and music, concatenate the shots, and export your finished 4K video.
- 加入音效和音乐,拼接镜头,导出成品 4K 视频。
- ใส่เอฟเฟกต์เสียงและดนตรี ต่อช็อตเข้าด้วยกัน แล้วส่งออกเป็นวิดีโอ 4K
🔧 Under the Hood: The Real Pipeline, A-to-Z幕后:完整的真实流水线(A 到 Z)เบื้องหลัง: ไปป์ไลน์จริงทั้งหมด A ถึง Z
Everything above is the workflow. Below is the real pipeline that produced an animated chapter of the "Quiet Mountain Sutra" project — every script, in the order it runs, from a story idea to a finished chapter MP4. Here's the map, then each stage with its real code.
上面讲的是工作流程。下面是真正跑通《静山经》项目某一动画章节的实际流水线——每一个脚本,按运行顺序,从一个故事点子一直到成品章节 MP4。先看全景图,再逐阶段看真实代码。
ข้างบนคือเวิร์กโฟลว์ ข้างล่างคือไปป์ไลน์จริงที่ผลิตบทแอนิเมชันของโปรเจกต์ "Quiet Mountain Sutra" — ทุกสคริปต์ เรียงตามลำดับที่รัน ตั้งแต่ไอเดียเรื่องไปจนถึงไฟล์ MP4 บทที่เสร็จแล้ว นี่คือแผนผัง แล้วตามด้วยโค้ดจริงของแต่ละขั้น
The full pipeline, A-to-Z
完整流水线全景(A 到 Z)
ไปป์ไลน์ทั้งหมด A ถึง Z
- Story → script → shot list. Write the story and break it into ~2–6s shots with an LLM (Step 1).故事 → 剧本 → 镜头清单。 写好故事,用 LLM 拆成约 2–6 秒的镜头(第 1 步)。เรื่อง → บท → รายการช็อต เขียนเรื่องแล้วใช้ LLM แบ่งเป็นช็อต ~2–6 วิ (ขั้นที่ 1)
- Character sheets —
build_character_sheet.py→character_sheets/*.png角色设定表 —build_character_sheet.py→character_sheets/*.pngชีตตัวละคร —build_character_sheet.py→character_sheets/*.png - World shots —
build_world_shot.py→world_shots/*.png(stages 2–3 run together viabuild_all_references.py)场景参考图 —build_world_shot.py→world_shots/*.png(第 2–3 阶段由build_all_references.py一起跑)ภาพฉาก —build_world_shot.py→world_shots/*.png(ขั้น 2–3 รันพร้อมกันผ่านbuild_all_references.py) - Per-scene stills —
render_chapter_from_refs.py→stills/scene_NN.png逐镜头起始帧 —render_chapter_from_refs.py→stills/scene_NN.pngภาพนิ่งต่อฉาก —render_chapter_from_refs.py→stills/scene_NN.png - Narration + motion plan — Edge TTS → audio +
durations_miao.json; motion prompts →i2v_plan_miao.json旁白 + 动作计划 — Edge TTS → 音频 +durations_miao.json;动作提示 →i2v_plan_miao.jsonเสียงบรรยาย + แผนการเคลื่อนไหว — Edge TTS → เสียง +durations_miao.json; คำสั่งการเคลื่อนไหว →i2v_plan_miao.json - i2v clips on the Spark —
qms_anime_funnel.py(HTTPS) orqms_i2v_batch.py(SSH) →clips_anime/scene_NN.mp4在 Spark 上生成 i2v 片段 —qms_anime_funnel.py(HTTPS)或qms_i2v_batch.py(SSH)→clips_anime/scene_NN.mp4คลิป i2v บน Spark —qms_anime_funnel.py(HTTPS) หรือqms_i2v_batch.py(SSH) →clips_anime/scene_NN.mp4 - Assemble —
qms_assemble_anime.py(trim + voiceover + concat) →ch01_anime.mp4合成 —qms_assemble_anime.py(裁剪 + 配音 + 拼接)→ch01_anime.mp4ประกอบรวม —qms_assemble_anime.py(ตัด + เสียงพากย์ + ต่อ) →ch01_anime.mp4
Stage 1 — Story, script & shot list
阶段 1 — 故事、剧本与镜头清单
ขั้นที่ 1 — เรื่อง บท และรายการช็อต
This is Step 1 made concrete. Before a single image exists, the chapter lives as words. Here are the three real source files behind this chapter's opening — the hand-authored planning the rest of the pipeline runs on. (The prose has since been revised to a different setting; the script and shot list are the earlier draft that produced the rendered teaser — the pipeline is identical either way.)
这是第 1 步的具体落地。在任何一张图存在之前,章节先以文字的形式存在。下面是本章开篇背后的三个真实源文件——流水线其余部分所依赖的、人工撰写的策划。(散文后来被改写成了另一种设定;剧本和镜头清单是更早的草稿,正是它产出了已渲染的预告——无论哪个版本,流水线都一样。)
นี่คือขั้นที่ 1 ที่จับต้องได้ ก่อนจะมีภาพสักภาพ บทอยู่ในรูปของคำก่อน นี่คือไฟล์ต้นฉบับจริงสามไฟล์เบื้องหลังบทเปิดนี้ — งานวางแผนที่เขียนด้วยมือซึ่งไปป์ไลน์ส่วนที่เหลือทำงานอยู่บนมัน (ร้อยแก้วถูกแก้ไขเป็นฉากอื่นไปแล้ว; บทและรายการช็อตคือฉบับร่างก่อนหน้าที่ผลิตตัวอย่างที่เรนเดอร์ไว้ — ไม่ว่าฉบับไหน ไปป์ไลน์ก็เหมือนกัน)
- The story — the prose: what happens and how it feels.
- 故事 — 散文:发生了什么、是什么感觉。
- เรื่อง — ร้อยแก้ว: เกิดอะไรขึ้นและให้ความรู้สึกอย่างไร
- The script — per scene: a visual prompt, the narration line, plus music and voice direction (the blueprint the still-renderer reads).
- 剧本 — 每个镜头:一段画面提示、旁白台词,外加音乐与配音指导(起始帧渲染器读取的蓝图)。
- บท — ต่อฉาก: พรอมป์ภาพ ประโยคบรรยาย พร้อมดนตรีและการกำกับเสียง (บลูพรินต์ที่ตัวเรนเดอร์ภาพนิ่งอ่าน)
- The shot list — per scene: frame count + exact narration duration + the motion prompt (what the i2v step reads).
- 镜头清单 — 每个镜头:帧数 + 精确旁白时长 + 动作提示(i2v 步骤读取的内容)。
- รายการช็อต — ต่อฉาก: จำนวนเฟรม + ความยาวเสียงบรรยายที่แม่นยำ + คำสั่งการเคลื่อนไหว (สิ่งที่ขั้น i2v อ่าน)
Show the story — chapter_01_the_quieting.md展开故事 — chapter_01_the_quieting.mdแสดงเรื่อง — chapter_01_the_quieting.md
# Chapter 1 — The Quieting
The village had four old names and one new one and nobody could remember anymore which one was correct, but the road came up out of Kaili in seventeen switchbacks and stopped at it, and that was where they lived. Up on the ridge above the valley. Up where the karst cliffs of Léi Gōng Shān — Thunder Lord Mountain — stood watch like old Daoist masters in the late light. Twenty-six stilt houses, a small **Tǔ-dì-miào** — earth-god shrine — with a broken bronze bell hanging from a weathered frame, a Dong drum tower at the open ground that had not held a proper drum in years but still held its empty cradle, a small dry market that had been a wet market once when there was still anyone to buy from, an earth-god shelf at the head of every path with marigolds and incense and a sealed orange Fanta from before the trucks stopped, a Dong **guǐ-mén** of carved wood and dangling figures at the head of the path that ran down to the cemetery. The Miao families lived at the high end near the springhead. A couple of Han Chinese families lived along the lower terraces. The one Dong family — old Lao-ma, her granddaughter Lin, her grandson Yan — lived in a bamboo shed at the cemetery's edge because that was where Lao-ma had wanted to live, that was where her dead were closest. Nobody minded. Nobody had minded anything for a long time.
The girl's name was Zoe — though the village called her Xiǎo-xīng, *little star,* and her grandmother called her A-Xīng, and the English name her mother had given her ten years ago in a Kaili hospital ward she could no longer picture exactly was a name that had survived the move up the mountain only because Mei had not let it go — and she was ten the year the radios stopped.
It was the small radio in the **Tǔ-dì-miào** that stopped first, in the dry season after the second long drought, and then the bigger one in her father's hunting pack, and then the truck radio that the headman Lao Wén kept under a tarp behind his house in case of an emergency he could never quite picture, and then there were no more radios in the village. Nobody bought new batteries because the man who had brought batteries up the mountain in his little blue pickup had stopped coming after the trouble at the Yunnan border. After that nobody really thought about radios anymore. There was nothing on them anyway. The last broadcast Pa remembered was a woman's voice from somewhere in Guiyang saying that the water in the Wu River had a name now and the name was one she could not pronounce. Then nothing. Then a long hiss like cicadas. Then Pa had reached out without looking and turned the dial all the way down and that was that.
The buses had stopped earlier. The buses had stopped two rains before. There had been three buses a day on the switchback road in Zoe's earliest memory, then two, then one, then one every three days, then one every week if you were lucky, and then the small green **xiǎo-bā** had come up empty one Sunday and the driver had gotten out and walked into the **Tǔ-dì-miào** and lit incense and walked out and gotten back in the bus and gone back down the mountain, and that had been the last bus, and he had not waved goodbye, and Zoe's mother said later he must have known. The salt stopped coming up the road. The postman stopped coming up the road. The man with the chickens stopped coming up the road. The girl with the SIM cards stopped coming up the road, although by then the SIM cards had stopped working anyway. The lowlands were keeping what they had now, what little they had, and the mountain was on its own.
The lowlands burned. Not all of them, but enough. The dry-season fires had begun starting earlier and lasting longer for as long as Zoe had been alive, and the last two years they had not really ended at all, only changed color, only spread, only ate further into the country her mother had come from. The smoke came up the mountain in three layers — the close-blue layer from the slash fires nearby, the middle-grey layer from somewhere in the Wujiang basin or the Sichuan rim or further out, and the far-brown layer that had no smell of vegetation in it at all, the smell of cities, of plastic and diesel and meat, and that one came up at night when the wind shifted and you could taste it on your teeth in the morning. Zoe's mother Mei would close the kitchen shutter against that wind and sing a song in *Guìzhōu-huà* she had not known she knew, and A-Pó would nod, would nod, would nod, *yes yes the song is coming back, the song is coming back to your mouth, daughter,* and Mei would not answer because she was not sure herself how the song had gotten there.
A-Pó was very old. She had been small even when she was tall, and now she was small and bent, sharp little eyes inside the deep wrinkles, white hair pulled back in a tight bun under the indigo embroidered Miao tunic she had worn every day of Zoe's life. Silver coin buttons down the front. Silver ornaments at her ears that swung when she nodded. A betel quid in her cheek almost always. She had been apprenticed to a **guǐ-shī** — a Miao spirit-master — when she was a girl, in another village far to the south near the Guangxi border, and although she had never become a full guǐ-shī herself she still knew the calls and she made them. She called the dead each morning at dawn, walking the dike of the lower paddy, naming her people. The names had no edges. Some of the names were Miao and some were Mandarin and some were the names of people Zoe had never known, people from before the village even had its newest name. Zoe liked the sound of it. So did Wei.
Wei was eight. Wei was Zoe's little brother. The family called him A-Wèi. Wei did not have many words and never had. Even before the quieting, Wei had been the family's silent one — Mei had taken him to the doctor in Kaili when he was four, in the years when there was still a doctor in Kaili you could get to, and the doctor had said *some children just come out this way,* and Mei had cried a little, but then she had stopped crying because Wei was Wei and Wei was already loved. Wei went around the yard with a stick he had cut himself from a young bamboo. He tapped things with it. He tapped the spirit pillar. He tapped the well's stone lip. He tapped the buffalo's flank softly — only softly, never hard — and he tapped Zoe's ankle when he wanted her to come look at something. He was small for his age. He had Pa's deep smile lines already on a round eight-year-old face. He went barefoot because Zoe went barefoot and he did what Zoe did, mostly. He was the only person in the village who had always lived in the wordless mode, because words had never come for him the way they had come for everyone else.
This was the family. Zoe, Wei, Mei who had been a nurse in Kaili before Kaili began to fail, and Pa who had been many things in his life but was now mostly a hunter because the trucks did not come, and A-Pó who was very old and who called the dead at dawn. They lived in the house Pa's father had built, on the high terrace just below the springhead, four houses down from the **Tǔ-dì-miào**. The house had a kitchen and one big room and a small porch and a Miao **jiā-shén-zhù** — household spirit pillar — in the back corner where a small bowl of rice was set each morning. The pillar had been there longer than Pa and would be there longer than any of them. A-Pó put the rice in the bowl.
The mornings were the same as they had always been because mornings on the mountain do not need a city to make them mornings. The cock crowed at the door of the world. The mist sat in the long valley below the rice terraces like milk in a bowl. The dew on the dike grass soaked Zoe's bare feet when she went out to bring the chickens their water, and the cool of the packed earth of the yard under her soles was the first true thing she knew each day, and the second true thing was the *plok plok plok* of Lao Wén's wife pounding cooked glutinous rice into the day's **nuò-mǐ-bā**, and the third true thing was A-Pó down at the edge of the lower paddy naming the dead in that slow singsong, and the fourth true thing was Wei, somewhere — Wei tapping a thing with his stick, Wei crouched beside a beetle, Wei laughing without sound at something only he had seen.
By midday the smoke would settle into the valley and the sky over the ridge would be white-hot and the children would gather under the longan tree by the well, and that was where Zoe first knew that the talking in her head was getting quieter.
She had not known there was a voice in her head, exactly, until it began to fade. It was just *her,* the way a fish does not know there is water. The voice had named things for her since she was small — *jīchǎn,* chicken; *mángguǒ,* mango; *māmā,* mother; *xuéxiào,* school; *bù,* no — and had told her what she felt and what she wanted and what she ought to be afraid of, and had run a kind of low chattering commentary on everything she did, and she had taken that to be the same thing as being alive. Then sometime in the dry season she noticed she could see the longan tree, the whole tree, the green and the dust on the green and the dark place where the trunk forked, without anything in her head saying *guì-yuán shù.* Just the tree. Just the seeing.
She did not tell anyone. She did not have words to tell anyone with, exactly, because the words were the thing that was going. But she felt, in a place that was not her head but somewhere lower and warmer, that the change was good. The way new rain on hot dust was good. She could not have argued for it. It just was.
Lin noticed before anyone else. Lin was Zoe's friend, Zoe's best friend, the Dong girl who lived at Lao-ma's with her brother Yan and who had a slingshot tucked permanently in the waistband of her patched shorts and a missing front tooth that she pushed her tongue through whenever she was thinking. Lin was a thief of green plums. Lin could read animal tracks the way other children read picture books. Lin was usually talking. Then one afternoon she and Zoe were sitting on a fallen log in the bamboo at the back of the village, watching a column of red ants carry small white eggs across the trail, and Lin was not talking and Zoe was not talking, and they had been sitting like that for what might have been an hour, and Lin did not feel like she was waiting for anything to begin and Zoe did not feel like she was waiting for anything to begin, and the ants went on with the eggs, and when at last Lin stood up and pulled Zoe up by the wrist she looked at Zoe for a long second with her tongue in the gap between her teeth and she did not say what she had decided to know.
She had decided to know that the same thing was happening to Zoe that was happening to her.
The buffalo was the next thing. The one buffalo the village still kept, an old half-blind **shuǐ-niú** named A-Bái because his hide was the color of old glutinous rice, lived in the lower paddy under a corrugated lean-to and chewed and chewed and looked at things with his one good eye. Zoe had been afraid of him as a smaller child because he was enormous and because the boys from the next village had told her **shuǐ-niú** sometimes went mad and trampled people, but she was not afraid of him anymore, and one afternoon when she was sitting on the dike eating a green plum with salt and chili, the buffalo had a thought, and the thought arrived in her without words.
The thought was: *I see you. I am tired. The sun is good. We are alive.*
She blinked. She looked at A-Bái. A-Bái was looking at her. He chewed. He went on chewing. He had not said anything. But the thought had been in her, fully formed, the way a stone is fully formed when you pick it up out of the riverbed. She felt it sitting there in her like something she had eaten. She finished the plum. She walked home with the thought still in her, and that night when Mei put her to bed she lay listening for the thought to come back, and another came instead, smaller, from the longan tree outside the window, and the thought was *you are not alone, little star, you are not alone.*
She slept.
Mei saw it first in the way mothers see things. Saw the long looking. Saw the daughter who would stand for an hour in the yard listening to nothing the way a dog listens to nothing, head tilted, mouth slightly open, eyes on no one thing in particular. And Mei — the city woman, the nurse, the one who had grown up among the noodle stalls and karaoke parlors of the side streets behind Kaili's old train station and who still had words in great busy stacks behind her eyes, words like *fā-shāo* and *chuàng-shāng* and *yùhòu* and the brand names of medications she no longer had, words she had once known in English too from a secondhand nursing textbook with a smudge of blue ink on its cover — Mei was afraid.
*Háizi,* she would say, kneeling in front of her daughter, *are you here? Háizi, look at Māmā, tell Mā your name.*
And Zoe would smile and the name would come slow, slow like rice cooking, *Zoe ya,* and Mei would touch her face and say *good, good,* in three different languages — Pǔtōnghuà, Miao, English — and try to laugh, and go on with the cooking. But at night, against the wooden wall, when she thought her husband was asleep, Mei wept. She wept silently, with the back of her hand against her mouth, because she did not want to wake her children. She wept for the loss of a daughter she could not yet name as lost. *Something is wrong with our girl,* she would whisper to Pa when he was awake, which was more often than he let on. *The smoke is in her. The radiation is in her. The world is in her. She is going somewhere I cannot follow.*
Pa listened. Pa listened a long time and said nothing. He had been a man of few words to begin with — Miao men of his line did not waste them — and now he had fewer. He lay on his side with his face toward the wall and he listened to his wife weep and he listened to the night outside their house — the long *hoooo* of the owl, the small dry rustle of bamboo, the river so far below in the valley you could only hear it if you stopped your own breath — and he listened to the voice in his own head, which had also been getting quieter for some months, and which he had not yet told her about. He did not know how to tell her. He was not sure she would forgive him for being unafraid.
So he reached back without turning and put his hand on her hip, and she took it, and they lay like that until she slept.
Pa got up before light. He did not eat. He took the .22, which had three rounds left, and he walked up the spine of the ridge toward the karst in the way he had now of walking, which was a way of moving through the forest like a man who is being read by the forest as he moves. He had not killed anything in nine days. He had stopped trying. He was simply going up to be there. He went up because the karst was up there. He had begun to feel that the karst was paying attention to him, and he wanted to pay attention back.
This was what was happening to him, and it was happening, in different speeds and different shapes, to almost everyone in the village. Lao Wén still wrote in his logbook by hurricane lamp every evening and pretended that the entries meant something, but he had begun to forget what the words meant as he wrote them, and lately the entries had become small drawings — a hen, a hand, a curl of smoke — and he had not noticed when the change had started. The old shī-gōng Yáng-shī-fù had been ahead of all of them for years; he swept the steps of the **Tǔ-dì-miào** with a slowness that had nothing to do with age, and he spoke now only to bless meals and rice, and even those blessings were getting shorter, were getting more like single sounds. Lao-ma, in her bamboo shed under the Dong **guǐ-mén** she had built with her own hands when she came to Xié-yán Zhài, had perhaps never fully made the crossing into the word-world in the first place; she greeted her grandchildren in the morning with a hand on each of their heads and a small low sound in her throat, and that, for her, was the morning.
Only Mei held her words. Only Mei spoke and spoke and spoke, in her head and out of it, against the quieting. She cooked with the radio voice in her own mouth: *now we are putting in the garlic, now we are adding the soy sauce, now we are stirring.* She did this so she would not lose them. A-Pó watched this and A-Pó knew what she was watching. *Her soul is still in Kaili,* A-Pó said once, to Pa, in Miao, in the doorway, while Mei was inside with her words. *Some day we will have to call it up.* Pa nodded and went out to chop wood. He understood. He did not know when she would do it. He trusted that she would know when.
The afternoon A-Pó took Zoe to the dike was the first cool afternoon of what should have been the rains and was not. The clouds had built and built over the western ridge all day in great purple banks and had given nothing. By four o'clock the heat had broken anyway out of sheer exhaustion, and the wind that came up the valley was less the smell of the lowlands than the cleaner smell of the wet karst stone above. A-Pó was sitting in the doorway of the family house chewing a betel quid she had not been able to spit out properly in a year because her teeth were going, and she looked at Zoe for a long while, and then she stood up and held out her hand. Wei was crouched in the yard beside her, tracing a circle in the dust with his stick.
*Come,* she said in Miao. *Come walk with Pó.*
Then to Wei, with her chin: *You too.*
The three of them walked past the **tǔ-dì-shén kān** at the head of the path, where someone had left a fresh marigold and the small can of orange Fanta from before the trucks stopped — the Fanta had been sitting there sealed for a year now and would sit there forever, the village's offering frozen in mid-gesture — and past the bamboo, which clicked dryly in the wind, and out to the high paddy. The terraces dropped away in long brown steps below them into the blue of the valley. The smoke down south had a color like an old bruise. A hawk turned in the air over the next ridge. A-Pó sat down on the dike with her old knees folded under her and pulled Zoe into her lap, and Zoe, who was getting a little big for laps, fit anyway. She had always been small. Wei sat on the dike a little apart, the stick laid across his knees. He was waiting too, in his own way. The hawk turned and turned.
Then A-Pó said, in a low Miao so slow it was almost not language:
*Little star. What is happening in your head is what is supposed to happen.*
*The talking we do in there — that was a fever the world caught. We caught it together a long time ago, and we made it bigger and bigger, and we made buses and bombs and bright screens out of it, and we sent the fever out into the rivers and into the air and into the rice. And now the fever is burning itself out down there in the valley. The world is letting it burn. The spirits are speaking again because we have finally gotten quiet enough to hear them.*
She switched into *Guìzhōu-huà* — the same words, almost, in the language Zoe's mother used, because A-Pó knew which words Zoe would carry home in her head if she carried any.
*The spirits never stopped, háizi. We stopped. Now we are starting again. The mountain is starting again with us.*
She switched back to Miao, very soft.
*You are not broken, little star.*
*You are early.*
Zoe, who had no words to answer with, did not answer. The buffalo down in the lower paddy had a thought, and the thought arrived. The karst above them had a thought, a slow one, a thought a million years long, and the thought arrived. A-Pó had a thought, and the thought was *I will be dead before the next rains and I am not sorry,* and that thought arrived too, and Zoe laid her small head against the old woman's shoulder and did not cry, because the old woman was right, was right, was right — the river below them was singing one long syllable about it, the cicadas were singing it, the smoke down south was singing it in a darker key, the red thread on Zoe's left wrist was warm and tight where A-Pó's hand rested on her arm — and there was no need to cry about a true thing.
Wei reached over without looking and put his small hand on Zoe's foot. He had heard the thought too. He had always been able to hear things like that. It was Zoe who was catching up to him.
They sat. The sun finished. The valley went the color of wet ash. Somewhere down the mountain a dog barked once and then forgot what it had been barking about. The wind shifted toward them and brought up out of the dark south the far-brown smell, and then shifted again and took it away.
And then, far below, where the switchback road climbed up out of the burned country, where it cut the last great loop before it began the long pull to the village, Zoe saw a light.
One light.
Then two.
Then a long broken line of them, moving slow, moving up.
She did not say anything. She did not need to. A-Pó had seen them in the same breath. The old hand tightened on Zoe's wrist over the red thread. Wei's hand tightened on Zoe's foot.
The lights kept coming. They were too high up to be cooking fires. They were too low to be stars. They were headlamps and torches and the cold blue of phone screens being used as torches, and they were following the road. They were following the road because the road was the only way up. They had been climbing all afternoon and only now, in the failing light, had they become visible.
A-Pó's old face wore a word, for the first time in many days.
*They are coming,* she said, in *Guìzhōu-huà* now, so that Mei — who could not yet hear what could not be heard — would have a chance, if Zoe told her later, to understand. *They are coming with all their noise.*
Below them, the buffalo A-Bái raised his great wet half-blind head from the paddy and turned it slow toward the south.
The hawk over the next ridge folded its wings and dropped.
The longan tree by the well shed three small dry leaves at once, although there was no breeze just then to shake them, and they spiraled down through the failing light, and they landed on the cool packed earth of the yard where Zoe was not standing but had been standing every morning of her life, and the leaves came to rest, and the village held its long breath, and the lights kept climbing.
Up on the trail above them, somewhere near the springhead, Yan — Lao-ma's tall quiet grandson — saw the lights too from his own watching place and turned and began to run down toward the cemetery shed to tell his grandmother what was coming. His bare feet on the trail made almost no sound. The Dong **guǐ-mén** at the head of his path watched him pass under it, and was watching what was behind him too.
Show the script — ch01_opening.json (the blueprint)展开剧本 — ch01_opening.json(蓝图)แสดงบท — ch01_opening.json (บลูพรินต์)
{
"_comment": "Quiet Mountain Sutra — Chapter 1 opening teaser (refs-based v2). 8 scenes, ~45-60 seconds. Each scene declares which character sheets and which world reference shot to feed to Nano Banana 2 inline. The style is locked because the same Dao watercolor aesthetic anchored every reference build.",
"_source": "stories/quiet_mountain_sutra/chapter_01_the_quieting.md (opening pages)",
"_pipeline_slug": "quiet_mountain_sutra_ch01_opening",
"_renderer": "render_chapter_from_refs.py",
"anchor_prompt": "Cinematic portrait of a 10-year-old Thai-Lisu girl named Dao, small and lean, long dark messy tangled hair, bright curious eyes, faded pink t-shirt and dark shorts, barefoot, a thin red thread bracelet on her left wrist, painterly Ghibli-style watercolor and ink illustration",
"scene_prompts": [
"Wide cinematic establishing shot of a Lisu hill tribe village on a karst ridge above Chiang Dao at golden hour, terraced rice paddies descending into a mist-filled valley below, simple wooden houses on stilts, a small wat with a tiled roof, the great limestone cliffs of Doi Luang Chiang Dao standing watch behind the village, no characters, painterly watercolor anamorphic landscape, soft warm light",
"Intimate close portrait of Dao the 10-year-old Lisu girl standing barefoot in the yard of her family house at midday, the village visible in soft focus behind her, her head tilted slightly to the side as if listening to something only she can hear, contemplative gentle expression, faded pink t-shirt, the red thread bracelet on her left wrist visible, painterly watercolor portrait",
"Quiet interior of a small Lisu mountain house kitchen at twilight, a small dusty transistor radio sitting silent on a wooden shelf, no light on its dial, a kerosene lamp lit nearby casting warm orange shadows across the dark wooden walls, a kettle on the cold hearth, no characters, atmospheric painterly watercolor, the sense of a sound that has stopped",
"Wide cinematic shot of an empty mountain switchback road at dusk, an abandoned orange songthaew pickup truck stopped at the side of the road with its tailgate down, no driver, no passengers, golden-orange smoke haze in the valley below, melancholy painterly watercolor, the road utterly still",
"Wide cinematic panorama looking south from a high mountain ridge over a long valley filled with three distinct horizontal layers of smoke — close blue layer, middle grey layer, far brown horizon layer — at late afternoon, the sun a dim red-orange disc through the haze, no characters, somber painterly watercolor composition, apocalyptic but beautiful",
"Cinematic shot of an elderly Lisu grandmother Yaa Saeng walking slowly along the high earthen dike of a terraced rice paddy at dawn, soft mist rising around her bare feet, dark blue traditional Lisu tunic with silver coin buttons, white hair pulled back in a tight bun, her back to the camera as she names the dead, the karst cliffs of Doi Luang Chiang Dao rising in the soft pink dawn distance, contemplative painterly watercolor, Miyazaki atmosphere",
"Cinematic medium shot beneath a large old longan tree by an antique stone well in the packed-earth yard of a Lisu hill village at midday. The young girl from the reference image stands at the base of the tree with her head tilted slightly to the side as if listening, her hand resting on the rough bark. Dappled gold-and-green light filters through the leaves. A few small dry yellowed leaves spiral slowly down around her. Painterly watercolor, intimate and quiet.",
"Wide cinematic shot of a mountain switchback road at deep dusk seen from the village above, a long broken line of small lights — headlamps and torches and the cold blue of phone screens — slowly climbing the road from the dark valley below, the karst cliffs looming dark above, the southern horizon glowing dim orange-red from distant fires, no faces visible at this distance, painterly watercolor, ominous and beautiful"
],
"scene_refs": [
{ "world": "village_ridge", "characters": [] },
{ "world": null, "characters": ["dao"] },
{ "world": "lisu_kitchen", "characters": [] },
{ "world": "switchback_road", "characters": [] },
{ "world": "valley_smoke_view", "characters": [] },
{ "world": "terraced_paddies", "characters": ["yaa_saeng"] },
{ "world": "longan_tree_well", "characters": ["dao"] },
{ "world": "switchback_road", "characters": [] }
],
"motion_prompts": [
"Slow steady push in toward the village on the ridge",
"Slow gentle push in on Dao's face, her eyes catching the light",
"Slow pan right across the kitchen shelf past the silent radio",
"Slow zoom out, revealing how empty the long mountain road is",
"Slow pan left across the three layered bands of smoke over the valley",
"Slow push in on the grandmother's small dark blue figure on the dike",
"Slow tight zoom in on Dao under the longan tree, leaves falling around her",
"Slow zoom in on the long broken line of climbing lights on the road"
],
"narration": [
"Above Chiang Dao, seventeen switchbacks high, the village holds its place on the karst.",
"The girl's name is Dao. She is ten. The talking in her head has begun to quiet.",
"The small radio in the wat stopped first. Then the truck radio. Then all the rest.",
"One Sunday the orange songthaew came up empty. The driver did not wave goodbye.",
"The smoke came up in three layers. The farthest one smelled of cities, of meat.",
"At dawn the grandmother walked the dike and named the dead, one by one.",
"Under the longan tree the girl tilted her head and listened. To nothing. To everything.",
"Then far below, the first light climbed the road. Then two. Then a long broken line."
],
"music_prompt": "Contemplative northern Thai mountain music — hill tribe pentatonic bamboo flute, slow khim (Thai dulcimer), soft sustained strings, breath-paced ~60 bpm, melancholy with warmth, Ghibli-apocalypse atmosphere, sparse and patient, room for the voiceover to breathe",
"voice_style": "Contemplative, slow, weighted, like an elder telling a true story to a child; pauses between sentences; warmth without sentimentality"
}
Show the shot list — i2v_plan_miao.json展开镜头清单 — i2v_plan_miao.jsonแสดงรายการช็อต — i2v_plan_miao.json
[
{
"scene": 0,
"narration_dur": 6.24,
"frames": 153,
"motion_prompt": "the morning mist rolls and drifts slowly through the valley, treetops sway in the wind, thin smoke curls upward from a chimney, a bird glides across the sky; the village itself still; clean anime, real gentle movement, NO camera move"
},
{
"scene": 1,
"narration_dur": 6.46,
"frames": 157,
"motion_prompt": "the girl slowly tilts and turns her head as if listening, she blinks, her long dark hair lifts and stirs in the breeze, she breathes softly, the faint warmth of the bodhisattva in her steady eyes; subtle living character animation, clean anime, NO camera move"
},
{
"scene": 2,
"narration_dur": 6.67,
"frames": 161,
"motion_prompt": "the kerosene lamp flame flickers and dances, warm shadows shift across the dark wooden walls, faint dust motes drift in the air, the silent radio sits motionless; real ambient motion, clean anime, NO camera move"
},
{
"scene": 3,
"narration_dur": 5.54,
"frames": 137,
"motion_prompt": "wind blows dust and dry leaves skittering across the empty road, golden smoke haze drifts in the valley below, the abandoned minibus door sways slightly; real ambient motion, clean anime, NO camera move"
},
{
"scene": 4,
"narration_dur": 5.74,
"frames": 141,
"motion_prompt": "the three layered bands of smoke billow, churn and drift slowly across the valley, the haze swirls, the dim red sun glimmers through the moving haze; real billowing motion, clean anime, NO camera move"
},
{
"scene": 5,
"narration_dur": 5.14,
"frames": 125,
"motion_prompt": "the old grandmother walks slowly forward along the dike, one bare foot stepping after the other, her steady gait continuing, her indigo tunic and sleeves swaying, mist rising and curling around her feet, the rice stalks rippling in the dawn breeze; real walking motion, clean anime, NO camera move"
},
{
"scene": 6,
"narration_dur": 6.7,
"frames": 161,
"motion_prompt": "dry yellow leaves fall, spiral and scatter slowly down around the girl, her long hair lifts gently in the breeze, she slowly turns her head to listen, dappled light flickers through the swaying leaves; real falling-leaf and character motion, clean anime, NO camera move"
},
{
"scene": 7,
"narration_dur": 6.84,
"frames": 165,
"motion_prompt": "the long broken line of small lights flickers and creeps slowly upward along the dark switchback road one by one, wind stirs the trees, the distant fire-glow on the horizon pulses; real moving lights, clean anime, NO camera move"
}
]
Stage 2 — Character sheets
阶段 2 — 角色设定表
ขั้นที่ 2 — ชีตตัวละคร
This is Step 2's "token locking" + "character sheet" in code. Each character has a precise text description (the locked tokens). The script sends that description plus a style-reference image to Nano Banana 2, which returns one canvas of six poses of the same person. That PNG becomes the character's permanent reference.
这就是第 2 步「锁定 token」+「角色设定表」的代码实现。每个角色都有一段精确的文字描述(被锁定的 token)。脚本把这段描述加上一张风格参考图发给 Nano Banana 2,它返回一张包含同一人物六个姿势的画布。这张 PNG 就成为该角色的永久参考图。
นี่คือ "การล็อก token" + "ชีตตัวละคร" ของขั้นที่ 2 ในรูปแบบโค้ด ตัวละครแต่ละตัวมีคำบรรยายข้อความที่แม่นยำ (token ที่ถูกล็อก) สคริปต์ส่งคำบรรยายนั้นพร้อมภาพอ้างอิงสไตล์ไปยัง Nano Banana 2 ซึ่งคืนผืนผ้าใบที่มีหกท่าของคนคนเดียวกัน ไฟล์ PNG นี้กลายเป็นภาพอ้างอิงถาวรของตัวละคร
--refmatches an existing image of THAT character (identity + style);--style-refborrows only the look of a DIFFERENT image — that's how every character inherits one locked watercolor style.--ref匹配该角色已有的图(身份 + 风格);--style-ref只借用另一张图的画风——这就是每个角色都继承同一种锁定水彩风格的方式。--refจับคู่กับภาพที่มีอยู่ของตัวละครนั้น (เอกลักษณ์ + สไตล์);--style-refยืมแค่ลุคของภาพอื่น — นี่คือวิธีที่ตัวละครทุกตัวสืบทอดสไตล์สีน้ำที่ล็อกไว้แบบเดียวกัน- It downsizes the reference to ≤1024px, then retries with backoff on rate limits — the standard shape of any image-gen API loop.
- 它把参考图缩小到 ≤1024px,遇到限流就带退避重试——这是所有图像生成 API 循环的标准写法。
- มันย่อภาพอ้างอิงให้ ≤1024px แล้วลองใหม่แบบถอยเวลาเมื่อโดนจำกัดอัตรา — รูปแบบมาตรฐานของลูป API สร้างภาพ
Show full script — build_character_sheet.py (258 lines)展开完整脚本 — build_character_sheet.py(258 行)แสดงสคริปต์เต็ม — build_character_sheet.py (258 บรรทัด)
#!/usr/bin/env python3
"""Build a multi-pose character sheet for one character, using an existing
already-canonical still as the visual reference. Saves to character_sheets/.
Phase 02a of the updated pipeline (see HTML/ai_video_pipeline.html). The sheet
exploits Nano Banana 2's internal consistency: a single render produces six poses
of the same character on one canvas, so face, hair, and clothing match. The
resulting PNG becomes the canonical reference for every future chapter.
Usage:
python build_character_sheet.py --character dao --ref imgs/free_pipeline/quiet_mountain_sutra_ch01_opening/stills/scene_01.png
"""
import argparse
import os
import sys
import time
from pathlib import Path
if sys.platform == "win32":
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
try:
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
sys.stderr.reconfigure(encoding="utf-8", errors="replace")
except (AttributeError, OSError):
pass
from PIL import Image
HERE = Path(__file__).resolve().parent
SHEETS_DIR = HERE / "character_sheets"
# Per-character descriptions. Pulled from story_bible.md.
CHARACTERS = {
"dao": {
"name": "Dao",
"desc": (
"10-year-old Thai-Lisu girl, small and lean build, long dark messy "
"tangled hair falling past her shoulders, bright curious dark brown "
"eyes, warm Southeast Asian brown skin, wearing a faded pink t-shirt "
"and dark cotton shorts, barefoot, a thin red cotton thread bracelet "
"on her LEFT wrist (only the left wrist, not the right), gentle "
"contemplative expression"
),
},
"noi": {
"name": "Noi",
"desc": (
"8-year-old Thai-Lisu boy, small for his age, round face, messy "
"short hair sticking up, big bright eyes, gap-toothed grin, "
"oversized hand-me-down t-shirt, torn shorts, barefoot, carries a "
"thin bamboo stick. Younger brother of Dao."
),
},
"ploy": {
"name": "Ploy",
"desc": (
"10-year-old Thai-Akha girl, short choppy hair with thick bangs, "
"mischievous grin showing a missing front tooth, dark brown skin "
"from always being outside, wearing a handwoven Akha geometric-"
"pattern shirt and patched shorts, barefoot, a small old scar on "
"her right knee, a homemade slingshot tucked in her waistband"
),
},
"yaa_saeng": {
"name": "Yaa Saeng",
"desc": (
"Late-70s Lisu grandmother, small and thin but upright, deeply "
"wrinkled face with sharp bright eyes that miss nothing, white "
"hair pulled back in a tight bun, wearing a traditional Lisu dark "
"blue tunic with silver coin buttons and silver ornaments, betel-"
"stained smile"
),
},
"phaw": {
"name": "Phaw (Father)",
"desc": (
"Mid-40s Thai-Lisu man, strong weathered build, sun-darkened skin, "
"quiet kind face with deep smile lines, wearing a worn straw "
"farmer's hat, rolled-up canvas pants, simple faded cotton shirt "
"with sleeves rolled up, calloused hands, a machete in a wooden "
"sheath on his belt"
),
},
"phueng": {
"name": "Phueng (Mother)",
"desc": (
"Around-30 Thai woman from Chiang Mai, slightly paler than the "
"hill-tribe villagers, long dark hair often tied back, kind tired "
"eyes, wearing a faded cotton tunic with hand-stitched repairs "
"over a long skirt, a thin red cotton thread bracelet on each wrist"
),
},
"mi_yeh": {
"name": "Mi-Yeh",
"desc": (
"Late-60s Akha grandmother, very small and thin, iron-grey hair "
"pulled back tight, weather-worn deeply-lined face, wearing a "
"faded handwoven Akha jacket and wide skirt, silver-coin earrings, "
"an Akha headdress with silver ornaments and beads, hands "
"worked-rough from a lifetime of weaving and farming"
),
},
"captain": {
"name": "The Captain",
"desc": (
"Around-50 Thai man, tired weathered face, short military-cut "
"black hair beginning to grey at the temples, wearing a faded "
"camouflage military uniform with insignia partly obscured by "
"dust, no rifle, carries a clipboard and a megaphone with duct "
"tape on its bell, exhausted posture, a man doing his job"
),
},
}
def load_creds() -> dict:
env_file = Path("C:/Users/Admin/claude/env.txt.txt")
creds = {}
if env_file.exists():
for line in env_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line or line.startswith("#") or ":" not in line:
continue
k, v = line.split(":", 1)
creds[k.strip()] = v.strip()
return creds
def build_prompt(char: dict, style_only: bool) -> str:
name = char["name"]
desc = char["desc"]
if style_only:
ref_clause = (
"STYLE REFERENCE ONLY — match the painterly watercolor-and-ink "
"aesthetic, color palette, line weight, paper-grain texture, and "
"soft warm lighting of the reference image. DO NOT include any "
"character from the reference image. The new character is a "
"completely different person described below."
)
identity_clause = f"All six poses must be the SAME PERSON: {name}, described below."
else:
ref_clause = (
"Match the character in the reference image exactly. Same face, "
"same hair texture and length, same skin tone, same clothes, same "
"accessories, same age."
)
identity_clause = "All six poses must be the SAME PERSON as in the reference image."
return f"""Character model sheet of {name} on a single canvas, six poses arranged in a 3x2 grid, plain neutral light grey background, clear separation between poses, no labels or text.
Poses, left to right, top to bottom:
1) Front close-up portrait, neutral expression, eye contact with camera
2) Three-quarter view standing, full upper body, looking slightly off-camera
3) Side profile bust, looking left
4) Full body standing, barefoot on plain ground, arms relaxed
5) Action pose running or walking, in motion
6) Emotional pose head tilted, listening, contemplative
Character description (same person in all six poses):
{name} — {desc}
CRITICAL — IDENTITY: {identity_clause}
CRITICAL — STYLE: {ref_clause}
Final style: Painterly Ghibli-style watercolor and ink illustration, soft warm light, gentle linework, matching the painterly aesthetic of the reference image. No anime stylization, no digital sharpness, no realistic photography — watercolor."""
def generate(char_key: str, ref_path: Path, out_path: Path, style_only: bool,
model: str = "gemini-3-pro-image-preview", aspect_ratio: str = "16:9",
retries: int = 3) -> None:
if char_key not in CHARACTERS:
sys.exit(f"Unknown character: {char_key}. Choose from {list(CHARACTERS.keys())}")
char = CHARACTERS[char_key]
prompt = build_prompt(char, style_only)
print(f"Character: {char['name']}")
print(f"Reference: {ref_path} ({'style only' if style_only else 'identity+style'})")
print(f"Output: {out_path}")
print(f"Model: {model}")
print()
creds = load_creds()
api_key = creds.get("google-api", "")
if not api_key:
sys.exit("No google-api key in env.txt.txt")
from google import genai
from google.genai import types
ref_img = Image.open(ref_path).convert("RGB")
if max(ref_img.size) > 1024:
ratio = 1024 / max(ref_img.size)
ref_img = ref_img.resize(
(int(ref_img.size[0] * ratio), int(ref_img.size[1] * ratio)),
Image.LANCZOS,
)
client = genai.Client(api_key=api_key)
for attempt in range(retries):
try:
resp = client.models.generate_content(
model=model,
contents=[prompt, ref_img],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(aspect_ratio=aspect_ratio),
),
)
for part in resp.candidates[0].content.parts:
if getattr(part, "inline_data", None):
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_bytes(part.inline_data.data)
print(f" saved {out_path.name} ({len(part.inline_data.data)//1024} KB)")
return
raise RuntimeError("No image in response")
except Exception as e:
err = str(e)
if "429" in err or "rate" in err.lower() or "quota" in err.lower():
wait = 20 * (attempt + 1)
print(f" rate limited, retry in {wait}s: {err[:120]}")
time.sleep(wait)
continue
if attempt < retries - 1:
wait = 5 * (attempt + 1)
print(f" error, retry in {wait}s: {err[:120]}")
time.sleep(wait)
continue
raise
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--character", required=True, help=f"Character key from {list(CHARACTERS.keys())}")
ap.add_argument("--ref", help="Identity+style reference (when the ref IS this character)")
ap.add_argument("--style-ref", help="Style-only reference (when the ref is a DIFFERENT character but same aesthetic)")
ap.add_argument("--out", default=None, help="Output path (default: character_sheets/.png)")
ap.add_argument("--aspect", default="16:9", choices=["16:9", "4:3", "3:4", "1:1", "9:16"])
args = ap.parse_args()
if not args.ref and not args.style_ref:
sys.exit("Provide either --ref (identity match) or --style-ref (style only)")
if args.ref and args.style_ref:
sys.exit("Provide --ref OR --style-ref, not both")
ref_path = Path(args.ref or args.style_ref)
if not ref_path.exists():
sys.exit(f"Reference image not found: {ref_path}")
style_only = bool(args.style_ref)
out_path = Path(args.out) if args.out else SHEETS_DIR / f"{args.character}.png"
generate(args.character, ref_path, out_path, style_only, aspect_ratio=args.aspect)
if __name__ == "__main__":
main()
Stage 3 — World shots
阶段 3 — 场景参考图
ขั้นที่ 3 — ภาพฉาก
Same engine as Stage 2, but for places. Each location has a locked text description, and the prompt adds a hard "NO CHARACTERS — empty landscape/interior" rule. Crucially, it passes the locked character sheet as a style-only reference, so every environment shares the exact same watercolor look as the cast.
和阶段 2 同一个引擎,只是对象是地点。每个地点都有一段锁定的文字描述,提示里加了一条硬性规则「不要角色——空的风景/室内」。关键在于它把锁定的角色设定表作为仅风格参考传入,所以每个场景都和角色拥有完全相同的水彩画风。
เอนจินเดียวกับขั้นที่ 2 แต่สำหรับสถานที่ แต่ละสถานที่มีคำบรรยายข้อความที่ล็อกไว้ และพรอมป์ใส่กฎเด็ดขาด "ไม่มีตัวละคร — ภูมิทัศน์/ภายในที่ว่างเปล่า" ที่สำคัญ มันส่งชีตตัวละครที่ล็อกไว้เป็นภาพอ้างอิงเฉพาะสไตล์ ดังนั้นทุกฉากจึงมีลุคสีน้ำเหมือนกับตัวละครเป๊ะ
- Run one location (
--location village_ridge) or the whole set (--location all); it skips any shot that already exists. - 可以只跑一个地点(
--location village_ridge)或整组(--location all);已存在的图会跳过。 - รันทีละสถานที่ (
--location village_ridge) หรือทั้งชุด (--location all); ภาพที่มีอยู่แล้วจะถูกข้าม
Show full script — build_world_shot.py (235 lines)展开完整脚本 — build_world_shot.py(235 行)แสดงสคริปต์เต็ม — build_world_shot.py (235 บรรทัด)
#!/usr/bin/env python3
"""Build a locked world establishing shot for one location, using the Dao
character sheet as a style-only reference.
Phase 02b of the updated pipeline (see HTML/ai_video_pipeline.html). Each
location is rendered once with deterministic style match so the village,
karst, road, etc. look the same way every time we drop them into a scene.
Usage:
python build_world_shot.py --location village_ridge
python build_world_shot.py --location all # build every locked location
"""
import argparse
import os
import sys
import time
from pathlib import Path
if sys.platform == "win32":
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
try:
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
sys.stderr.reconfigure(encoding="utf-8", errors="replace")
except (AttributeError, OSError):
pass
from PIL import Image
HERE = Path(__file__).resolve().parent
SHOTS_DIR = HERE / "world_shots"
STYLE_REF = HERE / "character_sheets" / "dao.png"
LOCATIONS = {
"village_ridge": (
"Wide cinematic establishing shot of a small Lisu hill tribe village "
"on a karst ridge above Chiang Dao in northern Thailand. Twenty-some "
"wooden houses on stilts with thatched and tin roofs, a small Theravada "
"wat with a tiled roof and a small bronze bell hanging in a wooden "
"frame, terraced rice paddies dropping in long brown-and-green steps "
"into a mist-filled valley below. The great grey limestone cliffs of "
"Doi Luang Chiang Dao rising in the background. Golden hour light, "
"soft mist drifting between houses, anamorphic landscape composition."
),
"doi_luang_karst": (
"Wide cinematic shot of the karst limestone cliffs of Doi Luang Chiang "
"Dao mountain, towering grey limestone walls weathered into jagged "
"vertical formations, patches of jungle and bamboo clinging to the "
"lower slopes, mist drifting around the upper peaks, the impression "
"of a great old living mountain that has been watching the valley "
"below for millions of years. Late afternoon light, painterly."
),
"switchback_road": (
"Wide cinematic shot of a narrow mountain road descending in seventeen "
"tight switchbacks down the side of a karst ridge in northern Thailand, "
"the road of red-brown dirt and gravel, lined with bamboo and dry grass, "
"no vehicles, the road empty, golden-orange smoke haze in the valley "
"below, an old abandoned songthaew pickup truck visible at one of the "
"lower switchbacks, melancholy and quiet."
),
"longan_tree_well": (
"Detailed cinematic shot of a large old longan tree growing beside an "
"antique stone well in the packed-earth yard of a Lisu hill village, "
"the tree's branches heavy with green-brown fruit, dappled gold-and-"
"green light through the leaves, the well's stone lip worn smooth from "
"generations of use, a few small dry yellowed leaves spiraling slowly "
"down to the dirt, no characters, quiet and intimate, painterly."
),
"wat_porch": (
"Cinematic shot of the open porch of a small Theravada Buddhist wat in "
"a Lisu hill village, wooden floorboards worn smooth, broom leaning "
"against the doorpost, a small bronze bell hanging in a wooden frame "
"at the corner with a visible thin crack along its rim, the wat's "
"carved wooden door open into shadow behind, no characters, late "
"afternoon golden light, contemplative atmosphere."
),
"lisu_kitchen": (
"Cinematic interior shot of a small Lisu mountain house kitchen, "
"wooden walls of unpainted dark planks, a low cooking fire in a clay "
"hearth in the floor, a kettle on the fire, a wooden shelf with simple "
"dishes and a small dusty transistor radio with no light on it, a "
"kerosene lamp lit nearby casting warm orange shadows, a small bowl "
"of rice on a low wooden table, no characters, atmospheric and "
"painterly, twilight."
),
"spirit_house": (
"Detailed cinematic shot of a Thai-style san phra phum spirit house "
"mounted on a wooden post at the head of a village path in northern "
"Thailand, the small ornate red-and-gold shrine with a tiled roof, "
"fresh yellow marigolds and incense sticks at its base, a sealed can "
"of orange Fanta soda placed on its small altar shelf as an offering "
"that has been there for a long time, gentle late afternoon light, "
"no characters, quiet reverent atmosphere."
),
"akha_spirit_gate": (
"Cinematic shot of an Akha lo-kah-pi spirit gate at the head of a "
"narrow earth path in a forested mountainside, two upright weathered "
"wooden posts with a crossbeam, hung from the crossbeam at intervals "
"small hand-carved wooden figures of a man with a hoe, a woman with "
"a fish, a dog, a sun-disc, and a child with a stick. The figures "
"are unpainted, weathered grey-brown, watching the path. Bamboo "
"growing close on both sides. No characters, mysterious and quiet."
),
"terraced_paddies": (
"Wide cinematic shot of long terraced rice paddies dropping in many "
"narrow earthen steps down the side of a mountain in northern "
"Thailand, the paddies fallow and dry in the late dry season, soft "
"browns and grey-greens, the karst ridge above and the long blue "
"ruin of the valley below, a single wooden tool shed at the edge of "
"the highest terrace, no characters, golden hour, painterly."
),
"valley_smoke_view": (
"Wide cinematic panorama looking south from a high mountain ridge "
"over a long valley filled with three distinct horizontal layers of "
"smoke: a close blue layer of nearby slash-fire smoke, a middle grey "
"layer of valley-fire smoke, and a far brown horizon layer of city "
"smoke. The sun is a dim red-orange disc through the haze. The "
"horizon glows with the dull color of distant fires. No characters, "
"somber painterly composition, apocalyptic but beautiful."
),
}
def load_creds() -> dict:
env_file = Path("C:/Users/Admin/claude/env.txt.txt")
creds = {}
if env_file.exists():
for line in env_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line or line.startswith("#") or ":" not in line:
continue
k, v = line.split(":", 1)
creds[k.strip()] = v.strip()
return creds
def build_prompt(location_desc: str) -> str:
return f"""{location_desc}
CRITICAL — NO CHARACTERS: This is a pure environment establishing shot. No people, no animals (unless explicitly part of the description). Empty landscape / interior. If the reference image contains characters, IGNORE the characters — match only the painterly style.
CRITICAL — STYLE: Match the painterly watercolor-and-ink aesthetic, color palette, line weight, paper-grain texture, and soft warm lighting of the reference image. Painterly Ghibli-style watercolor and ink illustration. No anime stylization, no digital sharpness, no realistic photography — watercolor.
Composition: 16:9 cinematic widescreen, anamorphic landscape, soft warm light, room for narrative voiceover."""
def generate(location_key: str, out_path: Path,
model: str = "gemini-3-pro-image-preview",
aspect_ratio: str = "16:9",
retries: int = 3) -> None:
if location_key not in LOCATIONS:
sys.exit(f"Unknown location: {location_key}. Choose from {list(LOCATIONS.keys())}")
prompt = build_prompt(LOCATIONS[location_key])
print(f"Location: {location_key}")
print(f"Style ref: {STYLE_REF.name}")
print(f"Output: {out_path}")
print()
if not STYLE_REF.exists():
sys.exit(f"Style reference not found: {STYLE_REF} — build Dao sheet first")
creds = load_creds()
api_key = creds.get("google-api", "")
if not api_key:
sys.exit("No google-api key in env.txt.txt")
from google import genai
from google.genai import types
ref_img = Image.open(STYLE_REF).convert("RGB")
if max(ref_img.size) > 1024:
ratio = 1024 / max(ref_img.size)
ref_img = ref_img.resize(
(int(ref_img.size[0] * ratio), int(ref_img.size[1] * ratio)),
Image.LANCZOS,
)
client = genai.Client(api_key=api_key)
for attempt in range(retries):
try:
resp = client.models.generate_content(
model=model,
contents=[prompt, ref_img],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(aspect_ratio=aspect_ratio),
),
)
for part in resp.candidates[0].content.parts:
if getattr(part, "inline_data", None):
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_bytes(part.inline_data.data)
print(f" saved {out_path.name} ({len(part.inline_data.data)//1024} KB)")
return
raise RuntimeError("No image in response")
except Exception as e:
err = str(e)
if "429" in err or "rate" in err.lower() or "quota" in err.lower():
wait = 20 * (attempt + 1)
print(f" rate limited, retry in {wait}s: {err[:120]}")
time.sleep(wait)
continue
if attempt < retries - 1:
wait = 5 * (attempt + 1)
print(f" error, retry in {wait}s: {err[:120]}")
time.sleep(wait)
continue
raise
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--location", required=True,
help=f"Location key from {list(LOCATIONS.keys())} or 'all'")
ap.add_argument("--out", default=None)
ap.add_argument("--aspect", default="16:9", choices=["16:9", "4:3", "1:1"])
args = ap.parse_args()
if args.location == "all":
for key in LOCATIONS:
out_path = SHOTS_DIR / f"{key}.png"
if out_path.exists():
print(f"[skip] {key} already exists at {out_path}")
continue
generate(key, out_path, aspect_ratio=args.aspect)
print()
else:
out_path = Path(args.out) if args.out else SHOTS_DIR / f"{args.location}.png"
generate(args.location, out_path, aspect_ratio=args.aspect)
if __name__ == "__main__":
main()
One command for both: build_all_references.py loops every character and every location, reusing one sheet as the style reference throughout — 7 sheets + 10 world shots ≈ 17 Nano Banana calls, about $0.68 total.
一条命令搞定两阶段: build_all_references.py 遍历每个角色和每个地点,全程复用同一张设定表作为风格参考——7 张设定表 + 10 张场景图 ≈ 17 次 Nano Banana 调用,总计约 $0.68。
คำสั่งเดียวทำทั้งสองขั้น: build_all_references.py วนทุกตัวละครและทุกสถานที่ ใช้ชีตเดียวเป็นภาพอ้างอิงสไตล์ตลอด — 7 ชีต + 10 ภาพฉาก ≈ 17 ครั้งที่เรียก Nano Banana รวมราว $0.68
Show full script — build_all_references.py (79 lines)展开完整脚本 — build_all_references.py(79 行)แสดงสคริปต์เต็ม — build_all_references.py (79 บรรทัด)
#!/usr/bin/env python3
"""Build all remaining character sheets + all world reference shots in one
sequential run. Uses Dao's sheet as the style reference for every other
character (style-only, not identity) and for every world shot.
Total: 7 character sheets + 10 world shots = 17 Nano Banana 2 calls.
Cost: ~$0.04 * 17 = ~$0.68.
"""
import subprocess
import sys
from pathlib import Path
HERE = Path(__file__).resolve().parent
DAO_SHEET = HERE / "character_sheets" / "dao.png"
CHARACTERS = ["noi", "ploy", "yaa_saeng", "phaw", "phueng", "mi_yeh", "captain"]
def run_character(key: str) -> bool:
out = HERE / "character_sheets" / f"{key}.png"
if out.exists():
print(f"[skip] character/{key} already exists")
return True
cmd = [
sys.executable, str(HERE / "build_character_sheet.py"),
"--character", key,
"--style-ref", str(DAO_SHEET),
]
print(f"\n=== Character: {key} ===")
res = subprocess.run(cmd)
return res.returncode == 0
def run_world(location: str) -> bool:
out = HERE / "world_shots" / f"{location}.png"
if out.exists():
print(f"[skip] world/{location} already exists")
return True
cmd = [
sys.executable, str(HERE / "build_world_shot.py"),
"--location", location,
]
print(f"\n=== World: {location} ===")
res = subprocess.run(cmd)
return res.returncode == 0
def main() -> None:
if not DAO_SHEET.exists():
sys.exit(f"Dao sheet missing: {DAO_SHEET}")
# Discover world locations from the world script
sys.path.insert(0, str(HERE))
from build_world_shot import LOCATIONS
world_keys = list(LOCATIONS.keys())
print(f"Plan: {len(CHARACTERS)} characters + {len(world_keys)} world shots")
print(f"Style reference for all: {DAO_SHEET.name}")
print()
failed: list[str] = []
for c in CHARACTERS:
if not run_character(c):
failed.append(f"character:{c}")
for w in world_keys:
if not run_world(w):
failed.append(f"world:{w}")
print("\n" + "=" * 60)
if failed:
print(f"FAILED ({len(failed)}):")
for f in failed:
print(f" - {f}")
else:
print(f"All references built.")
if __name__ == "__main__":
main()
Stage 4 — Per-scene stills
阶段 4 — 逐镜头起始帧
ขั้นที่ 4 — ภาพนิ่งต่อฉาก
This is Step 3 in code. A blueprint JSON lists, per scene, a text prompt plus which references it needs (characters + world). For each scene the script loads those exact reference PNGs from the library and sends them with a prompt that says: match identity from the character sheet, match setting from the world shot, keep the watercolor style. Out comes one on-model still per scene.
这是第 3 步的代码实现。一个蓝图 JSON 为每个镜头列出一段文字提示以及它需要哪些参考图(characters + world)。脚本为每个镜头从素材库加载那几张参考 PNG,连同提示一起发送:身份匹配角色设定表、场景匹配场景参考图、保持水彩风格。于是每个镜头输出一张不走样的起始帧。
นี่คือขั้นที่ 3 ในรูปแบบโค้ด ไฟล์ บลูพรินต์ JSON ระบุพรอมป์ข้อความของแต่ละฉาก พร้อมว่าต้องใช้ภาพอ้างอิงใด (characters + world) สำหรับแต่ละฉาก สคริปต์โหลดไฟล์ PNG อ้างอิงเหล่านั้นจากคลัง แล้วส่งไปกับพรอมป์ที่บอกว่า: จับคู่เอกลักษณ์จากชีตตัวละคร จับคู่ฉากจากภาพฉาก คงสไตล์สีน้ำ ผลคือภาพนิ่งที่คงตัวละครหนึ่งภาพต่อฉาก
- Before a full rebuild it backs up the old stills (
--backup-tag);--only 0 5 6re-renders just the scenes that missed. - 整批重渲染前会备份旧的起始帧(
--backup-tag);--only 0 5 6只重渲染没达标的那几个镜头。 - ก่อนสร้างใหม่ทั้งชุด มันสำรองภาพนิ่งเดิม (
--backup-tag);--only 0 5 6เรนเดอร์ใหม่เฉพาะฉากที่พลาด - Because every reference in the library was built in the same locked style, the stills come out consistent without any extra prompt-wrangling.
- 因为素材库里每张参考图都用同一种锁定风格生成,起始帧无需额外调提示就能保持一致。
- เพราะภาพอ้างอิงทุกภาพในคลังถูกสร้างด้วยสไตล์ที่ล็อกเหมือนกัน ภาพนิ่งจึงออกมาสม่ำเสมอโดยไม่ต้องปรับพรอมป์เพิ่ม
Show full script — render_chapter_from_refs.py (272 lines)展开完整脚本 — render_chapter_from_refs.py(272 行)แสดงสคริปต์เต็ม — render_chapter_from_refs.py (272 บรรทัด)
#!/usr/bin/env python3
"""Render every scene of a chapter from the character-sheet + world-shot
reference library, via Nano Banana 2. This replaces both the Pollinations
first pass AND the chain re-render — it goes straight from a refs-based
blueprint to locked-character, locked-world stills.
Each scene's blueprint entry declares:
scene_refs[i] = {
"world": "" or null,
"characters": ["", ...]
}
The renderer loads those reference PNGs from movie/character_sheets/ and
movie/world_shots/ and passes them as inline ref images alongside the
scene's text prompt. Nano Banana 2 produces a single still that matches
the references for identity (characters) and setting (world), with the
painterly watercolor style locked across the cast because every reference
in the library shares the same style.
Usage:
python render_chapter_from_refs.py --blueprint ch01_opening
python render_chapter_from_refs.py --blueprint ch01_opening --backup-tag v2
python render_chapter_from_refs.py --blueprint ch01_opening --only 0 5 6
"""
import argparse
import json
import os
import shutil
import sys
import time
from pathlib import Path
if sys.platform == "win32":
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
try:
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
sys.stderr.reconfigure(encoding="utf-8", errors="replace")
except (AttributeError, OSError):
pass
from PIL import Image
HERE = Path(__file__).resolve().parent
REPO_ROOT = HERE.parent.parent.parent
PIPELINE_OUT_ROOT = REPO_ROOT / "imgs" / "free_pipeline"
BLUEPRINTS_DIR = HERE / "blueprints"
CHAR_SHEETS_DIR = HERE / "character_sheets"
WORLD_SHOTS_DIR = HERE / "world_shots"
def load_creds() -> dict:
env_file = Path("C:/Users/Admin/claude/env.txt.txt")
creds = {}
if env_file.exists():
for line in env_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line or line.startswith("#") or ":" not in line:
continue
k, v = line.split(":", 1)
creds[k.strip()] = v.strip()
return creds
def load_ref(path: Path) -> Image.Image:
img = Image.open(path).convert("RGB")
if max(img.size) > 1024:
ratio = 1024 / max(img.size)
img = img.resize(
(int(img.size[0] * ratio), int(img.size[1] * ratio)),
Image.LANCZOS,
)
return img
def build_prompt(scene_text: str, char_keys: list[str], world_key: str | None) -> str:
has_char = bool(char_keys)
has_world = bool(world_key)
parts: list[str] = [scene_text, ""]
if has_char and has_world:
char_list = ", ".join(char_keys)
parts.append(
f"REFERENCES PROVIDED: 1) character sheet(s) for {char_list}, "
f"2) world reference shot of the location ({world_key})."
)
parts.append(
f"USE THE CHARACTER REFERENCE for identity: the {char_list} in this scene "
"must be the SAME PERSON as in the character sheet — match face, hair, "
"clothing, accessories exactly."
)
parts.append(
f"USE THE WORLD REFERENCE for setting: the location must match the "
f"reference shot — same architecture, same terrain, same atmospheric color palette."
)
elif has_char:
char_list = ", ".join(char_keys)
parts.append(
f"REFERENCES PROVIDED: character sheet(s) for {char_list}."
)
parts.append(
f"USE THE CHARACTER REFERENCE for identity: the {char_list} in this scene "
"must be the SAME PERSON as in the character sheet — match face, hair, "
"clothing, accessories exactly."
)
elif has_world:
parts.append(
f"REFERENCES PROVIDED: world reference shot of the location ({world_key})."
)
parts.append(
f"USE THE WORLD REFERENCE for setting: this scene's location must match "
"the reference shot — same architecture, same terrain, same atmospheric color palette."
)
parts.append(
"NO CHARACTERS: empty landscape / interior, no people."
)
parts.append("")
parts.append(
"STYLE: Painterly Ghibli-style watercolor and ink illustration. "
"Match the watercolor aesthetic, line weight, paper-grain texture, and "
"soft warm lighting of the reference image(s). No anime stylization, "
"no digital sharpness, no realistic photography — watercolor."
)
parts.append(
"FRAME: 16:9 cinematic anamorphic widescreen, room at the bottom for the voiceover."
)
return "\n".join(parts)
def render_scene(scene_idx: int, scene_text: str, refs: dict, out_path: Path,
client, model: str = "gemini-3-pro-image-preview",
aspect_ratio: str = "16:9", retries: int = 3) -> None:
from google.genai import types
char_keys: list[str] = refs.get("characters") or []
world_key: str | None = refs.get("world")
contents: list = [build_prompt(scene_text, char_keys, world_key)]
for ck in char_keys:
sheet = CHAR_SHEETS_DIR / f"{ck}.png"
if not sheet.exists():
raise FileNotFoundError(f"Missing character sheet: {sheet}")
contents.append(load_ref(sheet))
if world_key:
wshot = WORLD_SHOTS_DIR / f"{world_key}.png"
if not wshot.exists():
raise FileNotFoundError(f"Missing world shot: {wshot}")
contents.append(load_ref(wshot))
ref_summary = f"chars=[{','.join(char_keys) or '-'}], world={world_key or '-'}"
print(f"[scene_{scene_idx:02d}] {ref_summary}")
for attempt in range(retries):
try:
resp = client.models.generate_content(
model=model,
contents=contents,
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(aspect_ratio=aspect_ratio),
),
)
if not resp.candidates:
feedback = getattr(resp, "prompt_feedback", None)
block_reason = getattr(feedback, "block_reason", None) if feedback else None
msg = f"no candidates (block_reason={block_reason})"
if attempt < retries - 1:
print(f" {msg}, retry in 5s")
time.sleep(5)
continue
raise RuntimeError(msg)
cand = resp.candidates[0]
content = getattr(cand, "content", None)
parts = getattr(content, "parts", None) if content else None
finish_reason = getattr(cand, "finish_reason", None)
if not parts:
msg = f"empty parts (finish_reason={finish_reason})"
if attempt < retries - 1:
print(f" {msg}, retry in 5s")
time.sleep(5)
continue
raise RuntimeError(msg)
for part in parts:
if getattr(part, "inline_data", None):
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_bytes(part.inline_data.data)
print(f" saved {out_path.name} ({len(part.inline_data.data)//1024} KB)")
return
raise RuntimeError(f"no image part (finish_reason={finish_reason})")
except Exception as e:
err = str(e)
if "429" in err or "rate" in err.lower() or "quota" in err.lower():
wait = 20 * (attempt + 1)
print(f" rate limited, retry in {wait}s")
time.sleep(wait)
continue
if attempt < retries - 1:
wait = 5 * (attempt + 1)
print(f" error, retry in {wait}s: {err[:140]}")
time.sleep(wait)
continue
raise
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--blueprint", required=True, help="Blueprint name (without .json)")
ap.add_argument("--backup-tag", default="prev", help="Suffix for backing up existing stills/")
ap.add_argument("--only", type=int, nargs="*", default=None,
help="Render only specific scene indices (default: all)")
ap.add_argument("--aspect", default="16:9", choices=["16:9", "4:3", "1:1"])
args = ap.parse_args()
bp_path = BLUEPRINTS_DIR / f"{args.blueprint}.json"
if not bp_path.exists():
sys.exit(f"Blueprint not found: {bp_path}")
bp = json.loads(bp_path.read_text(encoding="utf-8"))
slug = bp.get("_pipeline_slug") or f"quiet_mountain_sutra_{args.blueprint}"
n = len(bp["scene_prompts"])
if len(bp.get("scene_refs", [])) != n:
sys.exit(
f"scene_refs length ({len(bp.get('scene_refs', []))}) "
f"does not match scene_prompts length ({n})"
)
cache_dir = PIPELINE_OUT_ROOT / slug
cache_dir.mkdir(parents=True, exist_ok=True)
stills_dir = cache_dir / "stills"
stills_dir.mkdir(exist_ok=True)
# Backup existing stills if doing a full rebuild.
if args.only is None:
backup_dir = cache_dir / f"stills_{args.backup_tag}"
if not backup_dir.exists() and any(stills_dir.glob("scene_*.png")):
backup_dir.mkdir()
for p in sorted(stills_dir.glob("scene_*.png")):
shutil.copy2(p, backup_dir / p.name)
print(f" backed up existing stills to {backup_dir.name}/")
for p in sorted(stills_dir.glob("scene_*.png")):
p.unlink()
# Write the clean blueprint into the pipeline cache (strip metadata).
clean = {k: v for k, v in bp.items() if not k.startswith("_")}
(cache_dir / "blueprint.json").write_text(
json.dumps(clean, indent=2, ensure_ascii=False), encoding="utf-8"
)
creds = load_creds()
api_key = creds.get("google-api", "")
if not api_key:
sys.exit("No google-api key in env.txt.txt")
from google import genai
client = genai.Client(api_key=api_key)
indices = args.only if args.only is not None else list(range(n))
print(f"\nRendering {len(indices)} scene(s) for slug '{slug}'\n")
for i in indices:
scene_text = bp["scene_prompts"][i]
refs = bp["scene_refs"][i]
out_path = stills_dir / f"scene_{i:02d}.png"
render_scene(i, scene_text, refs, out_path, client, aspect_ratio=args.aspect)
print(f"\nDone. Stills at {stills_dir}")
if __name__ == "__main__":
main()
Stage 5 — Narration + motion plan
阶段 5 — 旁白 + 动作计划
ขั้นที่ 5 — เสียงบรรยาย + แผนการเคลื่อนไหว
Two small data files glue the stills to the video step. Edge TTS renders the narration to an MP3 per scene, and durations_miao.json records each line's exact length (so clips can be trimmed to the voiceover later). A hand-written i2v_plan_miao.json lists, per scene, its frame count and its motion prompt (what moves in-frame). These two files are exactly what the i2v and assembly scripts read.
两个小数据文件把起始帧和视频步骤粘合起来。Edge TTS 为每个镜头把旁白渲染成 MP3,durations_miao.json 记录每句的精确时长(以便之后把片段裁到与配音等长)。一个手写的 i2v_plan_miao.json 为每个镜头列出帧数和动作提示(画面内什么在动)。这两个文件正是 i2v 和合成脚本读取的对象。
ไฟล์ข้อมูลเล็ก ๆ สองไฟล์เชื่อมภาพนิ่งเข้ากับขั้นวิดีโอ Edge TTS เรนเดอร์เสียงบรรยายเป็น MP3 ต่อฉาก และ durations_miao.json บันทึกความยาวที่แม่นยำของแต่ละประโยค (เพื่อให้ตัดคลิปเท่าเสียงพากย์ภายหลัง) ไฟล์ i2v_plan_miao.json ที่เขียนด้วยมือระบุจำนวนเฟรมและคำสั่งการเคลื่อนไหว (อะไรเคลื่อนในเฟรม) ของแต่ละฉาก สองไฟล์นี้คือสิ่งที่สคริปต์ i2v และสคริปต์ประกอบอ่าน
Stage 6 — Generating the clips (the Funnel script)
阶段 6 — 生成片段(漏斗脚本)
ขั้นที่ 6 — สร้างคลิป (สคริปต์ Funnel)
With stills + motion prompts ready, this script automates the i2v step end to end — the exact one used for the chapter. It drives the Spark's ComfyUI over the network: per scene, upload the still, run Wan 2.2, wait, download the MP4.
静帧 + 动作提示就绪后,这个脚本把 i2v 这一步从头到尾自动化——正是本章节所用的脚本。它通过网络驱动 Spark 上的 ComfyUI:每个场景上传静帧、运行 Wan 2.2、等待、下载 MP4。
เมื่อภาพนิ่ง + คำสั่งการเคลื่อนไหวพร้อม สคริปต์นี้ทำขั้น i2v ให้อัตโนมัติตั้งแต่ต้นจนจบ — ตัวเดียวกับที่ใช้กับบทนี้ มันสั่ง ComfyUI บน Spark ผ่านเครือข่าย: แต่ละฉากอัปโหลดภาพนิ่ง รัน Wan 2.2 รอ แล้วดาวน์โหลด MP4
🧠 The key idea: ComfyUI is an API, not just a UI
🧠 核心理念:ComfyUI 是一个 API,不只是界面
🧠 แนวคิดหลัก: ComfyUI เป็น API ไม่ใช่แค่หน้าจอ
ComfyUI's node graph is really just JSON. Anything you can build by dragging boxes around can also be sent as a JSON "workflow" to its HTTP API. So you can script an entire animation run — no mouse, no babysitting — and loop over every shot unattended while you do something else.
ComfyUI 的节点图本质上就是 JSON。凡是你能通过拖动方框搭出来的流程,都可以作为 JSON「工作流」发送给它的 HTTP 接口。于是你可以把整次动画生成都写成脚本——不用鼠标、不用盯着——让它在无人值守的情况下遍历每一个镜头,你去忙别的就行。
กราฟโหนดของ ComfyUI แท้จริงแล้วก็คือ JSON สิ่งใดที่คุณต่อได้ด้วยการลากกล่อง ก็ส่งเป็น "เวิร์กโฟลว์" JSON ไปยัง HTTP API ของมันได้เช่นกัน ดังนั้นคุณจึงเขียนสคริปต์ให้สร้างแอนิเมชันทั้งชุดได้ — ไม่ต้องใช้เมาส์ ไม่ต้องเฝ้า — แล้วปล่อยให้มันวนทำทุกช็อตเองโดยที่คุณไปทำอย่างอื่น
How it works, step by step
它是如何运作的,一步一步
มันทำงานอย่างไร ทีละขั้น
- Talks over HTTPS, no SSH. The Spark exposes ComfyUI through a Tailscale Funnel — a public HTTPS address. The script just makes ordinary web requests to it, uploading images and pulling back videos, so there's no SSH login or manual file-copying.
- 通过 HTTPS 通信,无需 SSH。 Spark 通过 Tailscale Funnel 把 ComfyUI 暴露成一个公网 HTTPS 地址。脚本只是向它发普通的网络请求——上传图片、取回视频——所以不需要 SSH 登录,也不用手动拷贝文件。
- คุยผ่าน HTTPS ไม่ต้อง SSH Spark เปิด ComfyUI ออกมาผ่าน Tailscale Funnel — เป็นที่อยู่ HTTPS สาธารณะ สคริปต์แค่ส่งคำขอเว็บธรรมดาไปหามัน อัปโหลดภาพและดึงวิดีโอกลับมา จึงไม่ต้องล็อกอิน SSH หรือก็อปไฟล์เอง
- One scene at a time. It reads a plan file (
i2v_plan_miao.json) that lists every scene's frame count, narration length, and motion prompt, then works through them in order. - 一次一个场景。 它读取一个计划文件(
i2v_plan_miao.json),其中列出每个场景的帧数、旁白时长和动作提示,然后按顺序逐个处理。 - ทีละฉาก มันอ่านไฟล์แผน (
i2v_plan_miao.json) ที่ระบุจำนวนเฟรม ความยาวเสียงบรรยาย และคำสั่งการเคลื่อนไหวของแต่ละฉาก แล้วไล่ทำทีละฉากตามลำดับ - Upload → queue → poll → download. For each scene it (1) uploads the starting still to
/upload/image, (2) POSTs a Wan 2.2 workflow to/prompt, (3) polls/history/{id}every 10 seconds until the render is done, then (4) downloads the MP4 from/view. - 上传 → 排队 → 轮询 → 下载。 对每个场景,它会 (1) 把起始帧上传到
/upload/image,(2) 向/prompt发送一个 Wan 2.2 工作流,(3) 每 10 秒轮询一次/history/{id}直到渲染完成,然后 (4) 从/view下载 MP4。 - อัปโหลด → เข้าคิว → ตรวจสถานะ → ดาวน์โหลด สำหรับแต่ละฉาก มันจะ (1) อัปโหลดเฟรมเริ่มต้นไปที่
/upload/image(2) ส่ง (POST) เวิร์กโฟลว์ Wan 2.2 ไปที่/prompt(3) ตรวจ/history/{id}ทุก 10 วินาทีจนเรนเดอร์เสร็จ แล้ว (4) ดาวน์โหลด MP4 จาก/view - Idempotent — safe to re-run. If
scene_03.mp4already exists locally, that scene is skipped. After a crash or timeout you can just run it again and it picks up exactly where it left off. - 幂等——可安全重跑。 如果本地已经有
scene_03.mp4,就跳过该场景。崩溃或超时之后,直接再跑一次即可,它会从上次停下的地方继续。 - ทำซ้ำได้อย่างปลอดภัย ถ้ามี
scene_03.mp4อยู่ในเครื่องแล้ว ฉากนั้นจะถูกข้าม หลังโปรแกรมล่มหรือหมดเวลา แค่รันใหม่อีกครั้ง มันจะทำต่อจากจุดที่ค้างไว้พอดี - One shared negative prompt fights the classic image-to-video failure: duplicated or extra limbs (the "third leg" problem when a character walks), plus blur, watermarks, and the dreaded "static, no motion" result.
- 一个共用的负面提示 用来对抗图生视频的经典翻车:肢体重复或多出(角色走路时的「第三条腿」问题),以及模糊、水印,还有最让人头疼的「静止、没有动作」结果。
- คำสั่งเชิงลบที่ใช้ร่วมกันหนึ่งชุด ไว้สู้กับความผิดพลาดคลาสสิกของภาพเป็นวิดีโอ: แขนขาซ้ำหรือเกิน (ปัญหา "ขาที่สาม" ตอนตัวละครเดิน) รวมถึงภาพเบลอ ลายน้ำ และผลลัพธ์ที่น่ากลัวที่สุดคือ "นิ่ง ไม่มีการเคลื่อนไหว"
💡 Why poll instead of just waiting?
💡 为什么用轮询而不是干等?
💡 ทำไมต้องวนเช็กแทนที่จะรอเฉย ๆ?
Video renders take minutes, and a single HTTP request would time out long before that. So the script fires the job, then checks /history every 10 seconds (up to 40 minutes per scene). Submit-then-poll is the standard pattern for any long-running AI job.
视频渲染要好几分钟,而单个 HTTP 请求远撑不到那么久就会超时。所以脚本先把任务发出去,然后每 10 秒查一次 /history(每个场景最多等 40 分钟)。「先提交,再轮询」是所有耗时较长的 AI 任务的标准做法。
การเรนเดอร์วิดีโอใช้เวลาหลายนาที และคำขอ HTTP เดียวจะหมดเวลาไปก่อนนานมาก สคริปต์จึงส่งงานออกไปก่อน แล้วเช็ก /history ทุก 10 วินาที (สูงสุด 40 นาทีต่อฉาก) "ส่งงานแล้ววนถามผล" คือรูปแบบมาตรฐานของงาน AI ที่ใช้เวลานาน
The Wan 2.2 workflow graph
Wan 2.2 工作流图
กราฟเวิร์กโฟลว์ Wan 2.2
The workflow() function returns the ComfyUI graph as a dictionary of numbered nodes wired together. Each node names a class_type and points its inputs at other nodes' outputs using ["node_id", output_index]. The chain runs:
workflow() 函数把 ComfyUI 图以「编号节点相互连线」的字典形式返回。每个节点都指定一个 class_type,并用 ["节点编号", 输出序号] 把自己的输入接到其他节点的输出上。整条链是这样的:
ฟังก์ชัน workflow() คืนกราฟ ComfyUI ออกมาเป็นดิกชันนารีของโหนดที่มีหมายเลขและต่อสายถึงกัน แต่ละโหนดระบุ class_type และชี้อินพุตของตนไปยังเอาต์พุตของโหนดอื่นด้วย ["หมายเลขโหนด", ลำดับเอาต์พุต] สายงานเป็นดังนี้:
- Load the Wan 2.2 model, the text encoder (CLIP), and the VAE.
- 加载 Wan 2.2 模型、文本编码器(CLIP)和 VAE。
- โหลดโมเดล Wan 2.2 ตัวเข้ารหัสข้อความ (CLIP) และ VAE
- Encode the positive (motion) prompt and the negative prompt into conditioning.
- 把正面(动作)提示和负面提示编码成条件(conditioning)。
- เข้ารหัสคำสั่งเชิงบวก (การเคลื่อนไหว) และคำสั่งเชิงลบให้เป็น conditioning
- Load the uploaded still and turn it into a latent "first frame" at 1280×704.
- 加载上传的静帧,并把它变成 1280×704 的潜在「第一帧」。
- โหลดเฟรมที่อัปโหลดแล้วแปลงเป็น "เฟรมแรก" ในพื้นที่ latent ที่ขนาด 1280×704
KSamplerdenoises for 30 steps to generate the motion between frames.KSampler去噪 30 步,生成帧与帧之间的运动。KSamplerลดสัญญาณรบกวน 30 สเต็ปเพื่อสร้างการเคลื่อนไหวระหว่างเฟรม- VAE-decode back to pixels, then
VHS_VideoCombinewrites a 24 fps H.264 MP4. - 用 VAE 解码回像素,然后
VHS_VideoCombine输出一个 24fps 的 H.264 MP4。 - ถอดรหัสด้วย VAE กลับเป็นพิกเซล แล้ว
VHS_VideoCombineเขียนเป็น MP4 H.264 ที่ 24 fps
Show full script — qms_anime_funnel.py (100 lines)展开完整脚本 — qms_anime_funnel.py(100 行)แสดงสคริปต์เต็ม — qms_anime_funnel.py (100 บรรทัด)
"""Drive the QMS anime Ch1 i2v entirely over the Spark's ComfyUI Tailscale
Funnel (HTTPS) — no SSH/scp needed. For each scene: upload the local anime
still, run Wan 2.2 TI2V-5B i2v with the real-motion prompt, poll, then download
the finished MP4 via ComfyUI's /view endpoint. Idempotent (skips done scenes).
python qms_anime_funnel.py
"""
import json, time, urllib.request, urllib.parse, ssl
from pathlib import Path
HOST = "https://spark-anime.taila00da8.ts.net"
BUILD = Path("C:/Users/Admin/claude/qms_i2v_build")
STILLS = Path("C:/Users/Admin/claude/watdonchan/imgs/free_pipeline/quiet_mountain_sutra_ch01_opening/stills")
OUT = BUILD / "clips_anime"; OUT.mkdir(exist_ok=True)
plan = json.loads((BUILD / "i2v_plan_miao.json").read_text())
CTX = ssl.create_default_context()
NEG = ("extra leg, third leg, multiple legs, duplicate legs, extra limbs, "
"duplicated limbs, merging legs, fused legs, limb ghosting, deformed, "
"bad anatomy, overexposed, static, blurry, low quality, worst quality, "
"jpeg artifacts, watermark, text, flickering, motionless")
def req(path, data=None, headers=None, method=None, timeout=120):
r = urllib.request.Request(HOST + path, data=data, headers=headers or {}, method=method)
return urllib.request.urlopen(r, timeout=timeout, context=CTX)
def upload(path: Path) -> str:
b = path.read_bytes()
bnd = "----comfy" + str(int(time.time()))
body = (f"--{bnd}\r\nContent-Disposition: form-data; name=\"image\"; "
f"filename=\"{path.name}\"\r\nContent-Type: image/png\r\n\r\n").encode() + b + f"\r\n--{bnd}--\r\n".encode()
with req("/upload/image", body, {"Content-Type": f"multipart/form-data; boundary={bnd}"}, "POST") as r:
return json.loads(r.read())["name"]
def workflow(ref, pos, frames, prefix, seed=12345):
return {
"1": {"class_type": "UNETLoader", "inputs": {"unet_name": "wan2.2_ti2v_5B_fp16.safetensors", "weight_dtype": "default"}},
"2": {"class_type": "ModelSamplingSD3", "inputs": {"model": ["1", 0], "shift": 8.0}},
"3": {"class_type": "CLIPLoader", "inputs": {"clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors", "type": "wan"}},
"4": {"class_type": "VAELoader", "inputs": {"vae_name": "wan2.2_vae.safetensors"}},
"5": {"class_type": "CLIPTextEncode", "inputs": {"clip": ["3", 0], "text": pos}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"clip": ["3", 0], "text": NEG}},
"7": {"class_type": "LoadImage", "inputs": {"image": ref}},
"8": {"class_type": "Wan22ImageToVideoLatent", "inputs": {"vae": ["4", 0], "width": 1280, "height": 704, "length": frames, "batch_size": 1, "start_image": ["7", 0]}},
"9": {"class_type": "KSampler", "inputs": {"model": ["2", 0], "positive": ["5", 0], "negative": ["6", 0], "latent_image": ["8", 0], "seed": seed, "steps": 30, "cfg": 5.0, "sampler_name": "uni_pc", "scheduler": "simple", "denoise": 1.0}},
"10": {"class_type": "VAEDecode", "inputs": {"samples": ["9", 0], "vae": ["4", 0]}},
"11": {"class_type": "VHS_VideoCombine", "inputs": {"images": ["10", 0], "frame_rate": 24, "loop_count": 0, "filename_prefix": prefix, "format": "video/h264-mp4", "pix_fmt": "yuv420p", "crf": 19, "save_metadata": False, "pingpong": False, "save_output": True}},
}
def run_scene(p):
i = p["scene"]; out = OUT / f"scene_{i:02d}.mp4"
if out.exists():
print(f"== scene {i}: already downloaded, skip", flush=True); return
still = STILLS / f"scene_{i:02d}.png"
prefix = f"qms_anime_s{i:02d}"
print(f"\n=== scene {i} {p['frames']}f / {p['narration_dur']}s ===", flush=True)
name = upload(still); print(f" uploaded {name}", flush=True)
wf = workflow(name, p["motion_prompt"], p["frames"], prefix)
with req("/prompt", json.dumps({"prompt": wf}).encode(), {"Content-Type": "application/json"}, "POST") as r:
pid = json.loads(r.read())["prompt_id"]
print(f" prompt_id {pid}", flush=True)
t0 = time.time()
while time.time() - t0 < 2400:
time.sleep(10)
try:
with req(f"/history/{pid}", timeout=60) as r:
hist = json.loads(r.read())
except Exception as e:
print(f" poll err: {str(e)[:80]}", flush=True); continue
if pid in hist:
outs = hist[pid].get("outputs", {})
fn = None
for nd in outs.values():
for k in ("gifs", "videos", "images"):
for f in nd.get(k, []):
if f.get("filename", "").endswith(".mp4"): fn = f
if fn:
q = urllib.parse.urlencode({"filename": fn["filename"], "subfolder": fn.get("subfolder", ""), "type": "output"})
with req(f"/view?{q}", timeout=120) as r:
out.write_bytes(r.read())
print(f" DONE in {int(time.time()-t0)}s -> {out.name} ({out.stat().st_size//1024} KB)", flush=True)
else:
print(f" finished but no mp4 output: {hist[pid].get('status')}", flush=True)
return
print(f" ...{int(time.time()-t0)}s", flush=True)
print(f" TIMEOUT scene {i}", flush=True)
if __name__ == "__main__":
for p in plan:
try:
run_scene(p)
except Exception as e:
print(f"!! scene {p['scene']} error: {str(e)[:120]}", flush=True)
(BUILD / "qms_anime_funnel_DONE").write_text("done\n")
print("\nALL SCENES PROCESSED", flush=True)
Stage 7 — Assembling the chapter
阶段 7 — 合成章节
ขั้นที่ 7 — ประกอบรวมเป็นบท
The funnel script leaves you with a folder of silent scene clips. Its companion, qms_assemble_anime.py, does the post-production from Step 5: it trims each clip to the exact length of its narration, mixes in that scene's voiceover, and concatenates everything into one finished chapter MP4 — all with ffmpeg, no video editor needed.
漏斗脚本(funnel)给你的是一个装满无声场景片段的文件夹。它的搭档 qms_assemble_anime.py 负责第 5 步的后期:把每个片段裁剪到与其旁白完全等长,混入该场景的配音,再把所有片段拼接成一个成品章节 MP4——全程用 ffmpeg,不需要视频剪辑软件。
สคริปต์ funnel ทิ้งโฟลเดอร์ที่เต็มไปด้วยคลิปฉากแบบไม่มีเสียงไว้ให้คุณ ส่วนคู่หูของมัน qms_assemble_anime.py ทำงานโพสต์โปรดักชันจากขั้นที่ 5: ตัดแต่ละคลิปให้ยาวเท่ากับเสียงบรรยายพอดี ผสมเสียงพากย์ของฉากนั้นเข้าไป แล้วต่อทุกคลิปเข้าด้วยกันเป็นบทที่เสร็จสมบูรณ์หนึ่งไฟล์ MP4 — ทั้งหมดด้วย ffmpeg ไม่ต้องใช้โปรแกรมตัดต่อ
What it does, in three moves
三步搞定
ทำงานสามจังหวะ
- Trim + add voice, per scene. It reads
durations_miao.json(each scene's exact narration length) and, for every scene, runs ffmpeg to cut the silent clip to that length (-t), take the picture from the clip and the audio from the matching MP3 (-map 0:v:0 -map 1:a:0), and write one segment. - 逐场景裁剪并配音。 它读取
durations_miao.json(每个场景旁白的精确时长),对每个场景运行 ffmpeg:把无声片段裁到该时长(-t),画面取自片段、音频取自对应的 MP3(-map 0:v:0 -map 1:a:0),输出一段片段。 - ตัดและใส่เสียง ทีละฉาก มันอ่าน
durations_miao.json(ความยาวเสียงบรรยายที่แม่นยำของแต่ละฉาก) แล้วรัน ffmpeg สำหรับทุกฉาก: ตัดคลิปเงียบให้เหลือความยาวนั้น (-t) ใช้ภาพจากคลิปและเสียงจาก MP3 ที่คู่กัน (-map 0:v:0 -map 1:a:0) แล้วเขียนออกมาเป็นเซ็กเมนต์เดียว - Re-encode to one uniform format. Every segment is encoded with identical settings — H.264,
yuv420p, 24 fps,profile high / level 4.0, AAC 160k at 48 kHz. This matters because the next step won't work otherwise (see the tip below). - 统一重编码格式。 每段都用完全相同的设置编码——H.264、
yuv420p、24fps、profile high / level 4.0、AAC 160k / 48kHz。这一点很关键,否则下一步会失败(见下方提示)。 - เข้ารหัสใหม่ให้เป็นรูปแบบเดียวกันทั้งหมด ทุกเซ็กเมนต์ถูกเข้ารหัสด้วยค่าเหมือนกันเป๊ะ — H.264,
yuv420p, 24 fps,profile high / level 4.0, AAC 160k ที่ 48 kHz เรื่องนี้สำคัญเพราะไม่อย่างนั้นขั้นถัดไปจะไม่ทำงาน (ดูเคล็ดลับด้านล่าง) - Concatenate into the chapter. It writes a list of the segments to
concat_anime.txtand uses ffmpeg'sconcatdemuxer to join them — in order — into the finalch01_anime.mp4. - 拼接成章节。 它把所有片段列表写入
concat_anime.txt,再用 ffmpeg 的concat解复用器(demuxer)按顺序把它们拼成最终的ch01_anime.mp4。 - ต่อรวมเป็นบท มันเขียนรายการเซ็กเมนต์ลงใน
concat_anime.txtแล้วใช้concatdemuxer ของ ffmpeg ต่อเซ็กเมนต์ทั้งหมดตามลำดับเป็นไฟล์สุดท้ายch01_anime.mp4
💡 Why every clip gets the same encode
💡 为什么每个片段都用相同的编码
💡 ทำไมทุกคลิปต้องเข้ารหัสเหมือนกัน
ffmpeg's concat demuxer doesn't re-encode — it just splices the streams end to end, so every segment must share the same codec, resolution, frame rate, and pixel format or the join fails or glitches. The profile high / level 4.0 / yuv420p combo also guarantees the result plays on phones — the same mobile-decoder rule that trips up video pipelines everywhere.
ffmpeg 的 concat 解复用器不会重新编码——它只是把数据流首尾拼接,所以每段必须有相同的编解码器、分辨率、帧率和像素格式,否则拼接会失败或出现错乱。profile high / level 4.0 / yuv420p 这套组合还能保证成品在手机上能播放——这正是各种视频流程里都会踩到的移动端解码规则。
demuxer concat ของ ffmpeg ไม่เข้ารหัสใหม่ — มันแค่ต่อสตรีมหัวท้ายเข้าด้วยกัน ดังนั้นทุกเซ็กเมนต์ต้องใช้ codec ความละเอียด เฟรมเรต และพิกเซลฟอร์แมตเดียวกัน ไม่งั้นการต่อจะล้มเหลวหรือภาพรวน ชุดค่า profile high / level 4.0 / yuv420p ยังรับประกันว่าไฟล์สุดท้ายเล่นบนมือถือได้ — กฎการถอดรหัสบนมือถือเดียวกับที่ทำให้ไปป์ไลน์วิดีโอทั่วโลกสะดุด
Show full script — qms_assemble_anime.py (42 lines)展开完整脚本 — qms_assemble_anime.py(42 行)แสดงสคริปต์เต็ม — qms_assemble_anime.py (42 บรรทัด)
#!/usr/bin/env python3
"""Assemble the QMS Ch1 ANIME chapter: trim each Wan i2v clip to its narration
length, mux that scene's voiceover, concat into one chapter MP4.
Expects (under qms_i2v_build/):
clips_anime/scene_NN.mp4 (downloaded from the Spark via the funnel)
audio_miao/scene_NN.mp3 (Edge TTS, Zoe/Miao narration)
durations_miao.json
Outputs ch01_anime.mp4.
"""
import json, subprocess, sys
from pathlib import Path
BUILD = Path("C:/Users/Admin/claude/qms_i2v_build")
durs = json.loads((BUILD / "durations_miao.json").read_text())
clips, aud = BUILD / "clips_anime", BUILD / "audio_miao"
seg = BUILD / "segments_anime"; seg.mkdir(exist_ok=True)
FF = "ffmpeg"
VENC = ["-c:v", "libx264", "-pix_fmt", "yuv420p", "-r", "24",
"-profile:v", "high", "-level", "4.0"]
AENC = ["-c:a", "aac", "-b:a", "160k", "-ar", "48000"]
segments = []
for i, D in enumerate(durs):
clip = clips / f"scene_{i:02d}.mp4"
a = aud / f"scene_{i:02d}.mp3"
if not clip.exists():
print(f"!! scene {i}: missing clip {clip.name}, skipping"); continue
out = seg / f"seg_{i:02d}.mp4"
subprocess.run([FF, "-y", "-i", str(clip), "-i", str(a), "-t", f"{D:.3f}",
"-map", "0:v:0", "-map", "1:a:0", *VENC, *AENC, str(out)], check=True)
segments.append(out)
if not segments:
sys.exit("no segments assembled")
lst = BUILD / "concat_anime.txt"
lst.write_text("".join(f"file '{s.as_posix()}'\n" for s in segments))
final = BUILD / "ch01_anime.mp4"
subprocess.run([FF, "-y", "-f", "concat", "-safe", "0", "-i", str(lst),
*VENC, *AENC, str(final)], check=True)
print(f"\nwrote {final} ({len(segments)} scenes)")
🖥️ The Render Box: Self-Hosting the DGX Spark渲染主机:自建 DGX Sparkเครื่องเรนเดอร์: โฮสต์ DGX Spark ด้วยตัวเอง
The "free per clip" path in Step 4 runs on a real machine: an NVIDIA DGX Spark sitting in Chiang Mai. Here's the whole setup behind it — how the box is reached, how its ComfyUI container is built and run, and the small fleet of scripts that drive it. If you want to self-host AI video instead of paying per clip, this is what that actually looks like.
第 4 步里「每片免费」的路线跑在一台真实的机器上:一台放在清迈的 NVIDIA DGX Spark。下面是它背后的整套配置——怎么连上这台机器、它的 ComfyUI 容器如何构建与运行,以及驱动它的一小批脚本。如果你想自建 AI 视频生成、而不是按片付费,这就是它实际的样子。
เส้นทาง "ฟรีต่อคลิป" ในขั้นที่ 4 รันบนเครื่องจริง: NVIDIA DGX Spark ที่ตั้งอยู่ในเชียงใหม่ นี่คือการตั้งค่าทั้งหมดเบื้องหลังมัน — วิธีเข้าถึงเครื่อง วิธีสร้างและรันคอนเทนเนอร์ ComfyUI ของมัน และชุดสคริปต์เล็ก ๆ ที่ขับเคลื่อนมัน ถ้าคุณอยากโฮสต์การสร้างวิดีโอ AI เองแทนที่จะจ่ายต่อคลิป นี่คือหน้าตาจริง ๆ ของมัน
Two ways in
两种连接方式
สองวิธีในการเข้าถึง
ssh ddtraveller@edgexpert-ca92.local with key auth. The primary channel for Docker, nvidia-smi, and process control on the box itself.
ssh ddtraveller@edgexpert-ca92.local,使用密钥认证。这是在机器本身上操作 Docker、nvidia-smi 和进程控制的主要通道。
ssh ddtraveller@edgexpert-ca92.local ด้วยการยืนยันตัวตนแบบคีย์ เป็นช่องทางหลักสำหรับ Docker, nvidia-smi และการควบคุมโปรเซสบนตัวเครื่องเอง
spark-anime.taila00da8.ts.net exposes ComfyUI's full HTTP API over HTTPS. This is what the funnel script uses — upload stills, queue renders, and pull results with no SSH at all.
spark-anime.taila00da8.ts.net 把 ComfyUI 的完整 HTTP API 通过 HTTPS 暴露出来。这正是漏斗脚本使用的方式——上传静帧、排队渲染、取回结果,完全不需要 SSH。
spark-anime.taila00da8.ts.net เปิด HTTP API เต็มรูปแบบของ ComfyUI ออกมาผ่าน HTTPS นี่คือสิ่งที่สคริปต์ funnel ใช้ — อัปโหลดภาพนิ่ง เข้าคิวเรนเดอร์ และดึงผลลัพธ์โดยไม่ต้องใช้ SSH เลย
⚠️ When the GPU is busy, SSH chokes
⚠️ GPU 繁忙时,SSH 会卡死
⚠️ เมื่อ GPU ทำงานหนัก SSH จะติดขัด
Under a heavy Wan 2.2 render the GPU starves the SSH session — logins hang or drop. That's exactly why the Funnel exists as a fallback: it keeps working over HTTPS even when SSH won't. (Tailscale bring-up has its own gotchas, documented in the project's TAILSCALE_SETUP.md.)
在繁重的 Wan 2.2 渲染下,GPU 会把 SSH 会话「饿死」——登录卡住或掉线。这正是 Funnel 作为备用通道存在的原因:即使 SSH 不行,它仍能通过 HTTPS 正常工作。(Tailscale 的启动有它自己的坑,记录在项目的 TAILSCALE_SETUP.md 里。)
ระหว่างการเรนเดอร์ Wan 2.2 หนัก ๆ GPU จะแย่งทรัพยากรจนเซสชัน SSH อดตาย — ล็อกอินค้างหรือหลุด นี่แหละคือเหตุผลที่ Funnel มีไว้เป็นทางสำรอง: มันยังทำงานผ่าน HTTPS ได้แม้ตอนที่ SSH ใช้ไม่ได้ (การตั้งค่า Tailscale มีจุดที่ต้องระวังของมันเอง บันทึกไว้ใน TAILSCALE_SETUP.md ของโปรเจกต์)
The ComfyUI container
ComfyUI 容器
คอนเทนเนอร์ ComfyUI
Everything runs inside one Docker container (spark/anime/) so the GPU environment is reproducible:
所有东西都跑在一个 Docker 容器里(spark/anime/),让 GPU 环境可复现:
ทุกอย่างรันอยู่ในคอนเทนเนอร์ Docker เดียว (spark/anime/) เพื่อให้สภาพแวดล้อม GPU ทำซ้ำได้:
Dockerfile— builds the arm64 image: ComfyUI plus the Wan 2.2 and AnimateDiff custom nodes.Dockerfile— 构建 arm64 镜像:ComfyUI 加上 Wan 2.2 和 AnimateDiff 自定义节点。Dockerfile— สร้างอิมเมจ arm64: ComfyUI พร้อมโหนดเสริม Wan 2.2 และ AnimateDiffbash/run.sh— starts the container with--gpus=all, host-mountedmodels/ output/ input/ workflows/, on port 8188.bash/run.sh— 用--gpus=all启动容器,挂载主机的models/ output/ input/ workflows/,端口 8188。bash/run.sh— เริ่มคอนเทนเนอร์ด้วย--gpus=allเมาต์models/ output/ input/ workflows/จากโฮสต์ ที่พอร์ต 8188bash/entrypoint.sh— on first run, fetches the models (including the Wan 2.2 weights), then launches ComfyUI.bash/entrypoint.sh— 首次运行时拉取模型(包括 Wan 2.2 权重),然后启动 ComfyUI。bash/entrypoint.sh— ครั้งแรกที่รัน จะดึงโมเดล (รวมถึงน้ำหนัก Wan 2.2) แล้วเปิด ComfyUIbash/stop.sh— stops and removes the container.bash/stop.sh— 停止并删除容器。bash/stop.sh— หยุดและลบคอนเทนเนอร์
The scripts that drive it
驱动它的脚本
สคริปต์ที่ขับเคลื่อนมัน
A handful of Python scripts submit work to the container and shuttle files in and out:
一小批 Python 脚本负责向容器提交任务,并在内外搬运文件:
สคริปต์ Python จำนวนหนึ่งส่งงานเข้าคอนเทนเนอร์และรับส่งไฟล์เข้าออก:
spark.py— a "Spark-as-a-tool" CLI:python spark.py chat|models|gpumakes one-shot calls into the Spark's Ollama (chat withqwen2.5:14b, list models, check the GPU) from a single command.spark.py— 「把 Spark 当工具」的命令行:python spark.py chat|models|gpu用一条命令对 Spark 上的 Ollama 发起一次性调用(用qwen2.5:14b聊天、列出模型、查看 GPU)。spark.py— CLI แบบ "Spark เป็นเครื่องมือ":python spark.py chat|models|gpuเรียก Ollama บน Spark แบบครั้งเดียวจบ (แชทด้วยqwen2.5:14bดูรายการโมเดล เช็ก GPU) ด้วยคำสั่งเดียวgenerate_walking.py— an AnimateDiff text-to-video workflow.generate_walking.py— 一个 AnimateDiff 文生视频工作流。generate_walking.py— เวิร์กโฟลว์ AnimateDiff แบบข้อความเป็นวิดีโอwan_i2v.py— the Wan 2.2 image-to-video workflow (lives on the Spark itself).wan_i2v.py— Wan 2.2 图生视频工作流(存放在 Spark 本机上)。wan_i2v.py— เวิร์กโฟลว์ Wan 2.2 ภาพเป็นวิดีโอ (อยู่บนตัว Spark เอง)qms_i2v_batch.py— a parameterized batch runner that loops scenes and callswan_i2v.pyover SSH (for detached runs).qms_i2v_batch.py— 参数化的批处理脚本,遍历场景并通过 SSH 调用wan_i2v.py(用于脱离终端的后台运行)。qms_i2v_batch.py— ตัวรันแบบแบตช์ที่ตั้งค่าได้ วนทุกฉากแล้วเรียกwan_i2v.pyผ่าน SSH (สำหรับรันแบบ detached)qms_anime_funnel.py— the same batch, but over the Funnel (HTTPS, no SSH) — the script detailed above, used to render this chapter.qms_anime_funnel.py— 同样的批处理,但走 Funnel(HTTPS,免 SSH)——就是上面详述的那个脚本,用来渲染本章节。qms_anime_funnel.py— แบตช์เดียวกันแต่ผ่าน Funnel (HTTPS ไม่ใช้ SSH) — สคริปต์ที่อธิบายไว้ข้างบน ใช้เรนเดอร์บทนี้qms_assemble.py/qms_assemble_anime.py— the local ffmpeg assembly step (trim, mux voiceover, concat → chapter MP4), also detailed above.qms_assemble.py/qms_assemble_anime.py— 本地 ffmpeg 合成步骤(裁剪、混入配音、拼接成章节 MP4),同样在上面详述。qms_assemble.py/qms_assemble_anime.py— ขั้นประกอบด้วย ffmpeg ในเครื่อง (ตัด ผสมเสียงพากย์ ต่อเป็น MP4 บท) อธิบายไว้ข้างบนเช่นกัน
Sibling services on the same box
同一台机器上的相邻服务
บริการข้างเคียงบนเครื่องเดียวกัน
The Spark does more than animation. Two related containers live alongside the ComfyUI one:
这台 Spark 不只是做动画。还有两个相关容器和 ComfyUI 容器并存:
Spark ทำมากกว่าแอนิเมชัน มีคอนเทนเนอร์ที่เกี่ยวข้องสองตัวอยู่ข้าง ๆ ตัว ComfyUI:
spark/wav2lip/— the live talking-avatar container (run.sh,stop.sh,server.py) behind the real-time English tutor.spark/wav2lip/— 实时说话头像容器(run.sh、stop.sh、server.py),是实时英语家教背后的服务。spark/wav2lip/— คอนเทนเนอร์อวตารพูดได้แบบเรียลไทม์ (run.sh,stop.sh,server.py) ที่อยู่เบื้องหลังติวเตอร์ภาษาอังกฤษเรียลไทม์spark/musetalk/— an earlier lip-sync experiment, now archived.spark/musetalk/— 早期的口型同步实验,现已归档。spark/musetalk/— การทดลองลิปซิงก์ก่อนหน้านี้ ตอนนี้เก็บเข้าคลังแล้ว
✅ Your Cartoon Progress Tracker✅ 你的动画进度清单✅ รายการเช็กความคืบหน้าการ์ตูนของคุณ
Tick each step as you finish it. Your progress is saved automatically on this device.完成每一步就打勾。你的进度会自动保存在本设备上。ติ๊กแต่ละขั้นเมื่อทำเสร็จ ความคืบหน้าจะถูกบันทึกอัตโนมัติบนอุปกรณ์นี้
Saved locally in your browser (no account, nothing uploaded).保存在你的浏览器本地(无需账号,不上传任何内容)。บันทึกในเบราว์เซอร์ของคุณ (ไม่ต้องมีบัญชี ไม่อัปโหลด)