Programmatic Workflow · ขั้นตอนแบบเขียนโค้ด · 编程化流程

A Truly Free, API-First AI Video Pipeline

ขั้นตอนสร้างวิดีโอ AI ฟรีจริง — ไม่ต้องจ่ายค่า API

真正免费的 AI 视频流程 — 无需付费 API

$0.00 per video Six phases · Python + ffmpeg · only a free Gemini key required
Six-phase AI video pipeline hero visualization

What changed and why มีอะไรเปลี่ยนไป改了什么、为什么

English

The earlier version of this pipeline cost about $2.50 per two-minute video — script via Gemini Pro, stills via Nano Banana 2, motion via Seedance 2.0 on Replicate. Watermark-free, but not free.

This version is $0. Every paid model has a free substitute that produces watermark-free output without daily caps. The trade-off is honest and small: stills come from Pollinations.ai FLUX (very close to Nano Banana 2), and the default motion is ffmpeg Ken Burns on stills — real camera moves, no AI subject motion. If you want subject motion back, an opt-in flag uses a free HuggingFace Space.

ไทย

เวอร์ชันก่อนของหน้านี้มีค่าใช้จ่ายราว $2.50 ต่อวิดีโอ 2 นาที — เขียนสคริปต์ด้วย Gemini Pro, ภาพนิ่งด้วย Nano Banana 2, ทำการเคลื่อนไหวด้วย Seedance 2.0 บน Replicate. ไม่มีลายน้ำ แต่ก็ไม่ฟรี

เวอร์ชันนี้ราคา $0. ทุก model ที่เคยจ่ายเงิน มีของฟรีแทนได้โดยไม่มีลายน้ำและไม่มีโควต้ารายวัน เทรดออฟตรงไปตรงมา: ภาพนิ่งใช้ Pollinations.ai FLUX (คุณภาพใกล้ Nano Banana 2 มาก), ส่วนการเคลื่อนไหวเริ่มต้นใช้ ffmpeg Ken Burns บนภาพนิ่ง — เป็นการเคลื่อนกล้องจริง ไม่ใช่การขยับวัตถุด้วย AI ถ้าอยากได้การเคลื่อนวัตถุจริง ๆ มี flag เสริมใช้ HuggingFace Space ฟรี

中文

本页面之前的版本每个 2 分钟视频成本约 $2.50——Gemini Pro 写脚本,Nano Banana 2 出图,Replicate Seedance 2.0 做动效。无水印,但不免费。

现在的版本是 $0。每个收费模型都有不带水印且无每日限额的免费替代品。取舍诚实而轻微:静帧来自 Pollinations.ai 的 FLUX(与 Nano Banana 2 非常接近),默认动效是 ffmpeg Ken Burns 推拉摇移,真实的相机运动,但不带 AI 主体运动。如果想要主体运动,加一个可选 flag 调用免费的 HuggingFace Space。

Reference build

The driver script lives at python/video_gen/free_pipeline.py. A verified end-to-end run with topic "a Lanna farmer at dawn launching khom loi lanterns over Chiang Mai" produced a 19.6-second 1280×720 H.264 video — three scenes, three voiceover lines, one anchor portrait, six stills, three Ken Burns clips, one final mux — in under a minute on a laptop. Total API spend: $0.00.

Pipeline at a glance ภาพรวมทั้งหมด整体流程一览

Phase Output Free service Auth Cost
1. Script Structured blueprint (JSON) Gemini 2.5 Flash (free tier) Free key from aistudio.google.com $0
2. Anchor 1 hero still (character lock) Pollinations.ai FLUX None — anonymous $0
3. Storyboard N scene stills Pollinations.ai FLUX (serial) None — anonymous $0
4. Voice Narration .mp3 per scene Microsoft Edge TTS None — anonymous $0
5. Motion Clip per scene (still + camera move) ffmpeg zoompan Ken Burns None — local CPU $0
5b. Motion (opt-in) Real i2v subject motion HF Space Lightricks/ltx-video-distilled Free HF account → token $0 (ZeroGPU quota)
6. Assembly Final .mp4 ffmpeg concat + mux None — local CPU $0

Total for a 2-minute video: $0.00 of API spend and roughly 1–4 minutes of wall-clock time. Stills run sequentially because Pollinations rate-limits concurrent anonymous requests; that's the only meaningful pacing constraint.

PHASE 01

Script generation เขียนสคริปต์อัตโนมัติ脚本自动生成

Gemini 2.5 Flash · free tier JSON schema

The free-tier Gemini 2.5 Flash is more than capable for blueprint generation — JSON mode is solid, latency is under three seconds, and the daily quota is generous. Grab a free key from aistudio.google.com/apikey and put it in env.txt.txt as google-api:AIza....

# pip install google-genai
from google import genai
from google.genai import types
import json

client = genai.Client(api_key=GEMINI_KEY)

SYSTEM = """You are a senior video director. Output JSON only:
{
  "anchor_prompt":  string,
  "scene_prompts":  [string],
  "motion_prompts": [string],
  "narration":      [string],
  "music_prompt":   string,
  "voice_style":    string
}"""

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Topic: a Lanna farmer at dawn launching khom loi lanterns. 6 scenes.",
    config=types.GenerateContentConfig(
        system_instruction=SYSTEM,
        response_mime_type="application/json",
        temperature=0.85,
    ),
)
blueprint = json.loads(resp.text)
Why Flash, not Pro

The blueprint task is small (~1k input, ~1k output, no images, no tools). Flash matches Pro for JSON-mode structured output here, runs ~3× faster, and sits inside the free-tier 1500-RPD quota. Save Pro for the rare task that actually benefits from it.

PHASE 02

Anchor image ภาพต้นแบบของตัวละคร角色锚定图

Pollinations.ai · FLUX · anonymous

Pollinations runs a public FLUX endpoint that returns clean PNGs over a single HTTP GET — no API key, no signup. URL-encode the prompt, fix a seed for repeatability, and the image bytes come back in 5–15 seconds.

import requests, hashlib
from urllib.parse import quote
from pathlib import Path

def seed_from(text):
    return int(hashlib.sha256(text.encode()).hexdigest()[:8], 16) % 2**31

prompt = blueprint["anchor_prompt"] + ", cinematic portrait, 16:9, hyperreal lighting"
url = (
    f"https://image.pollinations.ai/prompt/{quote(prompt)}"
    f"?model=flux&width=1280&height=720&seed={seed_from(prompt)}"
    f"&nologo=true&enhance=false"
)
r = requests.get(url, timeout=180)
Path("out/anchor.png").write_bytes(r.content)

The result is a 1280×720 FLUX render — visually close to Nano Banana 2, slightly less character-locked. Regenerate (vary the seed) until you have a portrait you like. This single file is the character contract for every storyboard scene.

PHASE 03

Storyboard stills (serial) ภาพแต่ละฉาก分镜批量生成

Pollinations.ai · serial loop

Loop the scene prompts. Repeat the anchor description in every prompt — Pollinations doesn't expose a true image-conditioning endpoint anonymously, so character coherence comes from prompt repetition plus deterministic seeding. Keep concurrency at 1: the anonymous endpoint returns 429 on parallel requests.

import asyncio, time

async def render_stills(blueprint, out_dir):
    sem = asyncio.Semaphore(1)             # serial — Pollinations 429s on parallel
    anchor_desc = blueprint["anchor_prompt"]

    async def one(i, scene_prompt):
        async with sem:
            full = f"{scene_prompt}, featuring: {anchor_desc}, 16:9 cinematic, matching character"
            await asyncio.to_thread(
                gen_still, full, seed_from(anchor_desc) + i,
                out_dir / f"scene_{i:02d}.png",
            )

    async with asyncio.TaskGroup() as tg:
        for i, p in enumerate(blueprint["scene_prompts"]):
            tg.create_task(one(i, p))
Rate-limit gotcha

Pollinations anonymous returns HTTP 429 with a tiny error JSON when you hit it concurrently. Exponential backoff (5s → 20s → 45s → 60s) covers transient spikes. If you need parallelism, sign up at enter.pollinations.ai for a free authenticated key — anonymous serial is fine for 5–30 stills.

PHASE 04

Narration เสียงพากย์配音生成

Microsoft Edge TTS · anonymous 300+ voices

The edge-tts Python package talks to the same neural TTS engine that powers Microsoft Edge's "Read Aloud" feature. No API key. No daily cap. 300+ voices, including two Thai ones (th-TH-PremwadeeNeural, th-TH-NiwatNeural). Quality is comparable to Gemini TTS for most narration uses.

# pip install edge-tts
import asyncio, edge_tts

async def tts(text, out_path, voice="en-US-AvaNeural"):
    c = edge_tts.Communicate(text, voice)
    await c.save(str(out_path))

async def render_voice(blueprint, out_dir):
    for i, line in enumerate(blueprint["narration"]):
        await tts(line, out_dir / f"voice_{i:02d}.mp3")

Voice is generated before motion in the free pipeline. The duration of each narration clip becomes the duration of each visual clip — that way the timeline lines up without a separate alignment pass.

PHASE 05

Motion — Ken Burns or HF Space ทำให้ภาพเคลื่อนไหว让画面动起来

ffmpeg zoompan HF Spaces (opt-in)

The default motion backend is ffmpeg's zoompan filter — push, pull, pan-left, pan-right. It runs locally in seconds, never fails, and gives you a real cinematic camera move on each still. For most narrative-over-stills work this is what you want anyway.

def kenburns_clip(src, duration, kb_dir, out, w=1280, h=720, fps=30):
    frames = int(round(duration * fps))
    if kb_dir == "in":
        z = "if(eq(on,0),1.0,zoom+0.0008)"
        x = "iw/2-(iw/zoom/2)"; y = "ih/2-(ih/zoom/2)"
    elif kb_dir == "out":
        z = "if(eq(on,0),1.12,max(1.001,zoom-0.0008))"
        x = "iw/2-(iw/zoom/2)"; y = "ih/2-(ih/zoom/2)"
    # ... left / right omitted for brevity
    vf = (
        f"scale=2560:-1:flags=lanczos,"
        f"zoompan=z='{z}':d={frames}:s={w}x{h}:fps={fps}:x='{x}':y='{y}',"
        f"scale={w}:{h}:force_original_aspect_ratio=decrease,"
        f"pad={w}:{h}:(ow-iw)/2:(oh-ih)/2:black,setsar=1,format=yuv420p"
    )
    subprocess.run([
        "ffmpeg", "-y", "-loglevel", "error",
        "-loop", "1", "-i", str(src),
        "-vf", vf, "-t", f"{duration}", "-r", str(fps),
        "-c:v", "libx264", "-preset", "fast", "-crf", "20",
        "-pix_fmt", "yuv420p", "-an", str(out),
    ], check=True)

If you want real AI subject motion (waving grass, lantern flicker, walking gait) — opt into a free HuggingFace Space. Lightricks/ltx-video-distilled exposes /image_to_video via Gradio, runs on ZeroGPU, and is gated only by your free HF account's daily quota.

# pip install gradio_client    (only needed for --i2v hf)
from gradio_client import Client, handle_file

c = Client("Lightricks/ltx-video-distilled", token=HF_TOKEN, verbose=False)
res = c.predict(
    prompt=motion_prompt,
    negative_prompt="worst quality, blurry, watermark",
    input_image_filepath=handle_file(str(still)),
    input_video_filepath="",
    height_ui=512, width_ui=768,
    mode="image-to-video",
    duration_ui=4.0, ui_frames_to_use=25,
    seed_ui=42, randomize_seed=False,
    ui_guidance_scale=3.0, improve_texture_flag=False,
    api_name="/image_to_video",
)
# res["video"] is a temp filepath — copy to your clips/ dir
HF Space caveats

(1) The Space requires a free HF token (hf_xxxx) — anonymous calls return 401. (2) ZeroGPU pools have daily quotas; long sessions can queue. (3) Spaces drift — endpoint names and parameter shapes change. free_pipeline.py wraps the call in a try/except and falls back to Ken Burns on failure, so a broken Space never blocks the build.

PHASE 06

Assembly ตัดต่อด้วย ffmpeg用 ffmpeg 合成

ffmpeg concat demuxer

One write of clips.txt, one concat-demuxer pass, one mux pass. The concat demuxer copies streams without re-encoding (fast and lossless), and the final mux is a single libx264 / AAC pass at -profile:v high -level 4.0 -pix_fmt yuv420p for guaranteed LINE / mobile-Safari playback.

# 1. Concat video clips
Path("out/clips.txt").write_text(
    "\n".join(f"file '{c.as_posix()}'" for c in clips)
)
subprocess.run([
    "ffmpeg", "-y", "-loglevel", "error",
    "-f", "concat", "-safe", "0", "-i", "out/clips.txt",
    "-c", "copy", "out/concat.mp4",
], check=True)

# 2. Concat narration mp3s the same way
#    (same pattern with voice_NN.mp3 -> voice_concat.mp3)

# 3. Mux video + voice into final.mp4
subprocess.run([
    "ffmpeg", "-y", "-loglevel", "error",
    "-i", "out/concat.mp4", "-i", "out/voice_concat.mp3",
    "-map", "0:v:0", "-map", "1:a:0",
    "-c:v", "libx264", "-preset", "medium", "-crf", "20",
    "-pix_fmt", "yuv420p", "-profile:v", "high", "-level", "4.0",
    "-c:a", "aac", "-b:a", "192k", "-shortest", "-movflags", "+faststart",
    "out/final.mp4",
], check=True)
Three ffmpeg pitfalls worth memorising

(1) xfade renegotiates back to yuv444p even if every input was 420 — that's why this pipeline uses the concat demuxer instead. Both faster and safer.

(2) drawtext can't parse Unicode escapes inline. If you add captions, write each line to a UTF-8 .txt file and use textfile=path.

(3) For Thai glyphs use tahoma.ttftahomabd.ttf (Tahoma Bold) has no Thai code points and renders .notdef squares.

The whole driver script สคริปต์หลัก主控脚本

Once each phase is a function, the driver is boring on purpose — that's the win over the click-driven version. --resume-from skips earlier phases when their output files already exist on disk.

# python/video_gen/free_pipeline.py
def main(topic, scenes, voice, i2v):
    blueprint = phase_script(topic, scenes, creds, base / "blueprint.json")
    phase_anchor(blueprint, base / "anchor.png")
    stills    = asyncio.run(phase_stills(blueprint, base / "stills"))
    voices    = asyncio.run(phase_voice(blueprint, voice, base / "voice"))
    durations = [d for _, d in voices]
    clips     = phase_motion(blueprint, stills, durations, i2v, creds["hf"], base / "clips")
    assemble(clips, voices, base / "work", base / "final.mp4")
# Run it
$ python free_pipeline.py --topic "a Lanna farmer at dawn launching khom loi lanterns" --scenes 6
$ python free_pipeline.py --topic "..." --voice th-TH-PremwadeeNeural   # Thai narration
$ python free_pipeline.py --topic "..." --i2v hf                        # real subject motion
$ python free_pipeline.py --topic "..." --resume-from voice              # keep stills, redo voice on

About watermarks เรื่องลายน้ำ关于水印

None of the free services in this stack burns in a watermark. Pollinations FLUX honours nologo=true. Edge TTS returns clean audio. The HF Space output is raw model frames. Watermarks only show up if you fall back to consumer-facing web tiers of Veo / Meta AI — that's where the manual workflow leans, and it's exactly what scripting avoids.

If you ever do need to remove one, do it mathematically, not with AI inpainting: a known semi-transparent overlay with a fixed alpha and position is invertible per-pixel with a few lines of NumPy. Inpainting hallucinates, which means inconsistent texture across frames and a flicker artefact you can't unsee.

What it actually costs ค่าใช้จ่ายจริง实际成本

Script
$0
Gemini 2.5 Flash · free tier
Stills (any N)
$0
Pollinations.ai · anonymous
Voice (any length)
$0
Edge TTS · anonymous
Motion (Ken Burns)
$0
ffmpeg · local CPU
Motion (HF i2v)
$0
Free HF token · ZeroGPU
Assembly
$0
ffmpeg · local CPU
Total per video
$0.00
Same for a 30s short or a 5-minute long-form

The verified Lanna-farmer reference build was three scenes, 19.6 seconds, $0.00. A two-minute, 25-scene run sits comfortably inside the Pollinations rate limits and the Gemini free-tier quota; total wall-clock time is roughly 5–8 minutes serially, most of it Pollinations waiting on FLUX.

When you'd still touch a UI เมื่อไหร่ที่ต้องใช้เว็บ什么时候仍需用界面

The principle is unchanged: automate the parts that are repetitive and verifiable; keep human judgment for the parts that need taste.