🏫 Build a Local AI Tutor — Docker, Wiki & the 5-Tool Stack

Role	Original plan	What shipped
Day-side grounding	Khoj as a separate Docker service with Postgres + pgvector backend, admin UI to register chat models and agents	LlamaIndex in-process inside the orchestrator. Embeddings via Ollama's `nomic-embed-text`. Persistent VectorStoreIndex on a named volume. Opt-in via `RAG_ENABLED=true`.
Chat LLM	Hermes 4 / Qwen 2.5 7B	Qwen 2.5 3B by default (2s ungrounded, ~20s grounded on CPU). Hermes 3 8B retained for the night worker.
Hermes Agent harness	An off-the-shelf `hermes_agent` pip package	Custom Ollama tool-calling loop, ~140 lines of Python. Shipped twice (owner-bridge + hermes-night) so each container is self-contained.
Owner remote control	LINE + Slack + Hermes Web UI	LINE + Slack webhooks only. Inbound-only with HMAC signature checks and owner-ID allowlist. No outbound polling = compatible with no-internet hosts. UI deferred.
Wiki content	User-supplied lessons + STORM-drafted articles	25-file seed corpus shipped in `docker/seed_wiki/` — 12 weekly lessons (A1→B1), pronunciation, vocabulary, persona, syllabus, methodology, sources. Drafted at Opus 4.7 quality; hermes-night refines on its night pass.
Compose service count	7+ services	4 runtime services (orchestrator, owner-bridge, tts, whisper) + host Ollama. `hermes-night` exists but is profile-gated so `up -d` doesn't bring it up.

🌱 About this project

ภาษาไทย

Kru Eng Local Bot คือแชทบอทสอนภาษาอังกฤษแบบเสียงคุยเสียงตอบ ที่ทำงานบนเครื่องของครูทั้งหมด — เสียงและข้อความของนักเรียนไม่ออกจากเครื่อง ตรงตามข้อกำหนด PDPA

วันนี้บอทยังเป็นแค่วงจรเสียง (ไมค์ → Whisper → Qwen 2.5 → XTTS → ลำโพง) ไม่มีฐานความรู้ที่จำได้ระหว่างคาบ

หน้านี้แสดงวิธีต่อยอด ให้บอทเรียนรู้จากทุกคาบ รับเนื้อหาใหม่ และร่างแผนการสอนให้ครูตรวจ — ด้วยเครื่องมือที่ทำงานในเครื่องทั้งหมด

English

Kru Eng Local Bot is a voice-in / voice-out English tutor that runs entirely on the teacher's own hardware. No student audio or transcript ever leaves the machine — PDPA-compliant by design.

Today it's just a voice loop: mic → Whisper → Qwen 2.5 → XTTS/Piper → speaker. It has no persistent memory between sessions.

This guide extends it with a five-tool context architecture (LlamaIndex · STORM · NotebookLM · Wiki · Hermes) so the bot can learn from every session, ingest new course materials, and draft its own lesson packets for the teacher to review — using off-the-shelf tools that all run locally on the same Ollama backend.

中文

Kru Eng 本地机器人是一个语音输入 / 语音输出的英语家教，完全运行在老师自己的硬件上。学生的音频和文字记录从不离开这台机器——天然符合 PDPA 数据保护要求。

今天它只是一个语音回路：麦克风 → Whisper → Qwen 2.5 → XTTS/Piper → 扬声器。课与课之间没有持久记忆。

本指南用一个五工具的上下文架构（LlamaIndex · STORM · NotebookLM · Wiki · Hermes）来扩展它，让机器人能从每一节课中学习、接收新的课程材料、并为老师起草可审阅的教学方案——所有工具都在同一个 Ollama 后端本地运行。

Overview & context — why this exists and who it's for

ภาษาไทย

เพื่อใคร: ครูสอนภาษาอังกฤษ (TEFL), โรงเรียนสอนภาษาขนาดเล็ก, และศูนย์การเรียนรู้ในประเทศไทยและเอเชียตะวันออกเฉียงใต้ ที่ต้องการใช้ AI โดยไม่ต้องส่งข้อมูลนักเรียนไปยังคลาวด์

ปัญหาที่แก้: ผู้ช่วยสอนบนคลาวด์ (ChatGPT, Khan Academy ฯลฯ) ใช้งานได้ดีแต่มีกำแพง 3 อย่าง — อัปโหลด transcript นักเรียนไปให้บุคคลที่สาม, มีค่าใช้จ่ายแบบสมัครสมาชิกต่อนักเรียนซึ่งสะสมตลอดปี, และทำให้การสอนของคุณต้องวิ่งตาม roadmap ของผู้ให้บริการ ไม่ใช่ของคุณเอง

โอกาส: โมเดล open-source (Qwen 2.5, Hermes 4) ตอนนี้ดีพอจริง ๆ สำหรับการสอนภาษาอังกฤษ, โมเดล embedding (nomic-embed) รันบน CPU ใดก็ได้, และ markdown wiki ธรรมดามีความทนทานมากกว่าฐานความรู้ proprietary ใด ๆ การ self-host กลายเป็นเรื่องจริงจังได้แล้วสำหรับครูคนเดียวที่มีโน้ตบุ๊กอายุ 5 ปี

ความสำเร็จคืออะไร: บอทที่เรียนรู้สไตล์การสอน ของคุณ เป็นสัปดาห์, ค่าใช้จ่ายต่อคำถามเป็น 0 บาทหลังติดตั้ง, ไม่รั่วข้อมูล, ร่างบทเรียนใหม่ในรูปแบบ DSS-PPP ของคุณ, และควบคุมได้จาก LINE/Slack บนมือถือของคุณ

การแลกเปลี่ยนที่ตรงไปตรงมา: คุณใช้เวลาประมาณ 3 ชั่วโมงในการติดตั้ง (เทียบกับ 0 สำหรับ SaaS), จัดการ backup เอง, เมื่อมีอะไรพัง คุณซ่อมเองแทนที่จะเปิดตั๋ว support แลกกับการเป็นเจ้าของข้อมูลและพฤติกรรมของบอทถาวร — ไม่มี vendor ไหนเปลี่ยนกติกาให้คุณได้

ใครไม่เหมาะ: ครูที่ไม่มีเวลาดูแลระบบเลย, หรือคนที่ต้องการสร้างเนื้อหาความเร็วสูงแบบสตาร์ทอัพ ที่ความเสถียรของ uptime สำคัญกว่าความเป็นเจ้าของข้อมูล

English

Who this is for: TEFL teachers, small language schools, and learning centers in Thailand and Southeast Asia who want AI assistance without sending student data to the cloud.

The problem this solves: Cloud-based tutoring assistants (ChatGPT, Khan Academy, etc.) work well but raise three barriers — they upload student transcripts to a third party, they cost a subscription per student that compounds over a year, and they shape your teaching to the vendor's roadmap rather than yours.

The opportunity: Open-source models (Qwen 2.5, Hermes 4) are now genuinely good enough for English tutoring, embedding models (nomic-embed) run on any CPU, and a plain markdown wiki is more durable than any proprietary knowledge base. Self-hosting has become realistic for a single teacher with a 5-year-old laptop.

What success looks like: A bot that learns your teaching style over weeks, costs $0 per query after install, never leaks data, drafts new lessons in your DSS-PPP format, and is controllable from your LINE/Slack on your phone.

Honest tradeoffs: You spend ~3 hours on install (vs zero for cloud SaaS). You manage your own backups. When something breaks, you fix it instead of opening a support ticket. In exchange you get permanent ownership of the data and the bot's behavior — no vendor can change the rules on you.

Who shouldn't bother: Teachers with no time to maintain a system, or anyone running a high-velocity content operation where uptime guarantees matter more than data ownership.

中文

给谁用：泰国及东南亚的英语教师（TEFL）、小型语言学校、学习中心——想用 AI 协助教学，但不希望把学生数据传到云端的人。

要解决的问题：基于云端的辅导助手（ChatGPT、Khan Academy 等）功能不错，但有三道门槛——把学生的对话记录上传到第三方；按学生订阅收费，一年累积下来不小；并且会让你的教学跟随供应商的路线图走，而不是你自己的。

机会：开源模型（Qwen 2.5、Hermes 4）现在确实已经能胜任英语辅导，嵌入模型（nomic-embed）在任何 CPU 上都能运行，而朴素的 markdown wiki 比任何专有知识库都更持久。对于一位拥有五年旧笔记本的独立教师来说，自托管已经变得现实可行。

成功是什么样：一个用几个星期学会你的教学风格的机器人；安装完成后每次提问 0 美元；数据从不外泄；按你的 DSS-PPP 格式起草新课程；并能通过手机上的 LINE/Slack 控制。

诚实的取舍：你花约 3 小时安装（云端 SaaS 是 0 小时）；你自己管理备份；出问题时你自己修，而不是开 support ticket。作为交换，你获得数据和机器人行为的永久所有权——没有供应商能单方面改变规则。

谁不适合：没有时间维护系统的老师；或者运营高频内容、对正常运行时间保证比数据所有权更看重的团队。

🥡 The tech stack — Docker makes it a recipe, not a black box

ภาษาไทย

Docker เป็นคำที่สำคัญที่สุดบนหน้านี้ รองจาก Wiki หากไม่มี Docker การติดตั้งบอทตัวนี้คงต้องสู้กับเวอร์ชัน Python, port ที่ชนกัน, และความแตกต่างของ OS ตลอดสุดสัปดาห์

มี Docker แล้ว ทั้ง stack อยู่ในไฟล์แค่ 2 ชนิด: docker-compose.yml และ Dockerfile ไม่กี่ตัว ทุกคนอ่านได้ ทุกคนแก้ได้ การติดตั้งไม่ใช่การเดินทาง — มันคือ สูตรอาหาร

ลองนึกถึง tech stack เหมือนเค้กชั้น ๆ แต่ละชั้นวางอยู่บนชั้นล่างและทำหน้าที่ของตัวเอง คุณเปิดดูในชั้นไหนก็ได้ ไม่มีอะไรซ่อนอยู่ ความกลัวว่า Docker เป็น "กล่องดำ" เป็นความเข้าใจผิด — Docker image ทุกตัวสร้างจากไฟล์ Dockerfile ที่เป็นข้อความธรรมดา ใครก็อ่านได้

English

Docker is the most important word on this page after Wiki. Without it, installing this bot would mean fighting Python versions, port conflicts, and OS differences across an entire weekend.

With Docker, the whole stack lives in two kinds of files: docker-compose.yml and a few Dockerfiles. Anyone can read them. Anyone can change them. The install stops being a journey and becomes a recipe.

Think of the tech stack like a layer cake. Each layer sits on the one below and does one specific job. You can lift the lid on any layer at any time — nothing is hidden. The "black box" fear about Docker is a misconception: every Docker image is built from a plain text Dockerfile that anyone can read.

中文

Docker 是本页除 Wiki 之外最重要的词。没有 Docker，安装这个机器人就意味着要花一整个周末跟 Python 版本、端口冲突、操作系统差异搏斗。

有了 Docker，整套技术栈都活在两类文件里：docker-compose.yml 和几个 Dockerfile。任何人都能读，任何人都能改。安装不再是一场旅行——它是一份食谱。

把技术栈想象成一块千层蛋糕。每一层都坐在下面那一层之上，做自己专门的工作。你随时可以掀开任何一层的盖子——没有什么是隐藏的。把 Docker 视为"黑盒"是个误解——每一个 Docker 镜像都是从一个普通文本文件 Dockerfile 构建出来的，任何人都可以阅读。

Seven layers, all open and inspectable. Docker manages layers 3 and 4 — the rest is yours.

Layer-by-layer breakdown

#	Layer	What's in it	What Docker does here
7	👤 Users	The teacher, the students, the owner	Nothing — Docker doesn't touch humans
6	🖥 Interfaces	Web pages on `:8000` / `:8081`, voice mic, LINE/Slack DMs	Exposes container ports to your host so a browser can reach them
5	🧠 Application logic	Orchestrator Python code, LlamaIndex retrieval, STORM articles, Hermes prompts, Wiki markdown content	Isolates each app — they can't accidentally break each other
4	📦 Containers	The runtime service images (ollama (host), whisper, tts, orchestrator, owner-bridge) plus the night-job container (hermes-night)	This is Docker — these are the units Docker creates and runs
3	🐳 Docker Engine	Docker Desktop (Windows/Mac) or Docker Engine (Linux) plus the Compose plugin	The runtime — reads your `docker-compose.yml` recipe and turns it into running software
2	💻 Operating system	Windows 11 / macOS / Ubuntu	Hosts Docker; Docker is just one program among many on your OS
1	🔧 Hardware	Your CPU, RAM, disk, optional NVIDIA GPU, network card	Provides the compute power; Docker passes GPU access through via the NVIDIA Container Toolkit

Without Docker vs With Docker — what changes for you, the installer

Install task	Without Docker 😩	With Docker 🎉
Install the LLM runtime	Download Ollama installer, run it, configure service, hope auto-start works	`docker compose up ollama`
Install Whisper STT	Install Python 3.10 specifically, install `faster-whisper`, install `ffmpeg`, configure CUDA, pray	Pre-baked image, all dependencies inside
Install TTS (XTTS + Piper)	Install Coqui, fight torch 2.0 vs 2.1, install Piper, download voice models	Pre-baked image, all dependencies inside
Add LlamaIndex	Pip-install LlamaIndex, point at the wiki folder, pull `nomic-embed-text` via Ollama, set `RAG_ENABLED=true`	Pre-baked image, mounts your wiki folder
Get all services talking	Configure ports, configure CORS, set environment variables, write start scripts	One `docker-compose.yml` defines the whole network
Reproduce on a new machine	Start over from step 1 (~3 hours if you're lucky)	Copy one folder, `docker compose up` (~15 minutes)
Update all components	Track each version, test compatibility, manually upgrade each, hope nothing breaks	`docker compose pull && docker compose up -d`
Roll back if broken	Re-install previous versions manually, in the right order	Pin to previous image tag in compose file, restart
Share with a colleague	Write a 20-page install guide; hope they don't hit a snag	Send them the folder; they run `docker compose up`

🎯 For students: this is what "infrastructure as code" means

Your install is no longer a process — it's a file you can read. The bot's deployment is a piece of source code, just like the application logic itself. You can git diff it. You can git revert it. You can code review the install of your own software.

This is one of the most important shifts in modern software engineering, and it applies to AI systems exactly the same way. The model files are huge binaries, but the wiring that makes them useful — the network, the volumes, the environment variables, the startup order — is plain text, version-controlled, and reproducible. That is what Docker buys you.

📦 Building it incrementally — start small, add tools as you need them

ภาษาไทย

คุณไม่ต้องติดตั้งทุกอย่างในวันแรก ระบบนี้ออกแบบให้ เติบโตทีละขั้น

เริ่มจาก stage 0 ใช้ได้จริงแล้ว — เป็นแค่บอทสนทนาเสียง ไม่มีฐานความรู้ ทุก stage ที่เพิ่มทำให้บอทฉลาดขึ้น แต่บอท stage ก่อนหน้าก็ทำงานได้สบาย

คำแนะนำ: ทำ stage 0–2 ให้เสร็จในสุดสัปดาห์แรก ใช้บอท 1 สัปดาห์ แล้วค่อยตัดสินใจว่าจะเพิ่ม stage 3+ หรือไม่

English

You don't have to install everything on day one. This system is designed to grow stage by stage.

Stage 0 is already useful by itself — it's the bare voice chatbot with no knowledge base. Every stage you add makes the bot smarter, but the earlier stages keep working unchanged.

Recommended path: get stages 0–2 running on the first weekend, use the bot for a week, then decide whether stages 3+ are worth the time. You can stop at any stage and still have a working system.

中文

你不需要第一天就安装所有东西。这套系统设计成可以分阶段逐步成长。

第 0 阶段本身就已经能用——是一个只有语音对话、没有知识库的基础机器人。你每加一个阶段，机器人就更聪明一些，而之前的阶段照常工作。

建议路线：第一个周末完成 0–2 阶段，用一星期，然后再决定第 3 阶段及以后值不值得投入时间。你可以在任何阶段停下来，系统照样可用。

A staircase, not a cliff. Each stage adds one service or one capability. Stop whenever the bot is good enough for your needs.

The stage-by-stage build-out

Stage	What you add	New capability	Time	You'd skip this if…
0 — Voice loop	`ollama · whisper · tts · orchestrator`	Talk to a generic AI tutor by voice. No memory between sessions.	~30 min	Never — this is the foundation
1 — Add the Wiki folder	Mount `wiki/` in orchestrator; write `INDEX.md` + your school docs	Bot can reference your syllabus and student notes manually in prompts	~30 min	You only want a generic tutor, not your school's tutor
2 — Add LlamaIndex	In-process inside the orchestrator. Reads the wiki volume, embeds via Ollama, persists the index to `/data/wiki_index/`	Bot grounds every answer in your wiki pages, with citations. This is the biggest single jump in usefulness.	~15 min	Your wiki is tiny (under 10 pages)
3 — Add Hermes UI	One container (`hermes-ui:8081`) + a few starter tools	Owner console — you can chat with the agent, run tools manually, inspect what's happening	~30 min	You only need student-facing voice; you're fine editing files in a terminal
4 — Add hermes-night	`hermes-night` profile + a cron job at 2 AM	Nightly maintenance: session notes get appended to student pages, vector index rebuilds, wiki pruning, morning email	~1 hour	You have under 5 students and prefer to keep notes by hand
5 — Add STORM	`knowledge-storm` Python package inside `hermes-night`	Nightly authoring: ask Hermes to "build Week 5", get a full DSS-PPP lesson packet by morning	~30 min	You write all your own lessons and don't want AI-generated drafts
6 — NotebookLM bootstrap	One-shot script run from your laptop (not the bot host)	Generates audio overviews, video summaries, slide decks, quizzes, flashcards, mind maps from public sources — once, then disconnected	~30 min	You don't have a Google account or don't want to use any cloud at all
7 — Add owner-bridge	One container (`owner-bridge:8082`) + Cloudflare Tunnel + LINE/Slack creds	Control the bot from your phone via LINE or Slack DMs	~30 min	You're always near the laptop and don't need remote control
8 — Network isolation	Switch Docker network to `--internal`, block outbound at firewall	The host cannot reach the internet. PDPA compliance becomes a property of topology, not policy	~15 min	You need outbound access for some other tool (e.g. an external email API)

💡 The order is not random — each stage builds on the prior

You can't add LlamaIndex before the Wiki folder (it needs files to embed). You can't add STORM before Hermes-night (Hermes is what calls STORM). You can't do network isolation before everything else is running (it would break the install). Follow the order; skip later stages if you don't need them.

🎯 The teacher's path through these stages

Most single teachers stop at Stage 4 and are happy. The voice bot grounds answers in their wiki (stages 0–2), they have a console to inspect it (stage 3), and the nightly worker keeps the wiki tidy (stage 4). That covers 90% of the value with about 3 hours of total install time.

Stages 5–8 are for teachers who want the bot to generate new content (5), want the rich starter media (6), want phone-based remote control (7), or care about sealing the network for compliance audits (8). They're all worth doing eventually — but not on day one.

The division of labour

When	Tool	Role
☀ Day-time (live tutoring)	🦙 LlamaIndex (or AnythingLLM)	Local grounded Q&A — reads the same markdown Wiki the teacher edits, cites the page it's quoting
🌙 Nightly (autonomous authoring)	🌪 STORM	Generates Wikipedia-style draft lesson articles from sources, with simulated expert perspectives
🌙 Nightly (orchestration)	🤖 Hermes 4 agent	Plans tool calls, prunes the wiki, refreshes indexes, emails the morning report
🎛 Owner control (any time)	🤖 Hermes Web UI + 🌉 owner-bridge	Teacher's console at `:8081`; LINE/Slack DM commands route through `owner-bridge` to the same Hermes agent
📚 Substrate (always)	Karpathy LLM Wiki	Plain markdown — the human-readable knowledge base both sides edit; git-versioned
📦 Install-time only	NotebookLM	Used once during install to generate audio/video/slide/quiz/flashcard starter pack from public sources; disconnected forever after

🎯 Three design rules that shape every decision below

Student data never leaves the machine. LlamaIndex, STORM, Hermes, Wiki — all local on Ollama. NotebookLM runs once at install on public reference works only, then the Docker host disconnects from the internet permanently.
The bot drafts; the teacher approves. Nothing the bot writes goes live to students without a teacher diff-review. The Wiki uses git so every approval is a recorded signal.
The Wiki is plain markdown. No proprietary format, no vector-DB-only knowledge. If any tool in this stack disappears tomorrow, your knowledge base survives unchanged.

🌗 The goal — a day-and-night worker bot

ภาษาไทย

กลางวัน: นักเรียนคุยกับบอท บอทตอบโดยใช้ LlamaIndex ซึ่งอ่านไฟล์มาร์กดาวน์ของ Wiki โดยตรง และอ้างอิงหน้าที่ใช้ตอบ

กลางคืน: Hermes ตื่นมาทำงานบ้าน — อ่าน transcript ของวันนี้, สั่ง STORM ให้เขียนบทเรียนใหม่จากแหล่งข้อมูล, อัปเดต Wiki, ตัดหน้าเก่าทิ้ง, รีบิวด์ดัชนีเวกเตอร์, ส่งอีเมลสรุปให้ครูตอนเช้า

English

Day: students talk to the bot. The orchestrator runs its in-process LlamaIndex retriever over the markdown Wiki and returns a grounded answer with citations back to the pages it used.

Night: Hermes wakes up and does the housework — reads today's transcripts, runs STORM to draft Wikipedia-style articles for any new lesson topic, edits the Wiki (append session notes, prune stale pages, surface contradictions), rebuilds the vector index, and emails the teacher a morning summary.

中文

白天：学生与机器人对话。编排器询问 LlamaIndex，LlamaIndex 直接读取 markdown Wiki 并返回带引用的有据答复，标注它引用了哪些页面。

夜间：Hermes 醒来做家务——读取今日的对话记录，调用 STORM 为新课程主题起草维基百科风格的文章，编辑 Wiki（追加课堂笔记、修剪过时页面、标出矛盾内容），重建向量索引，并在清晨发送总结邮件给老师。

This is the goal: real-time student-facing chat by day, autonomous authoring and curation by night, all five tools playing distinct roles around one shared markdown Wiki.

💡 Why the day/night split works

The day-side bot stays fast and cheap — Qwen 7B reading a RAG-grounded answer is responsive even on CPU. The expensive cognitive work (Hermes 35B reasoning over transcripts, STORM drafting full lesson articles, LlamaIndex re-embedding pages) happens at night when latency doesn't matter and the GPU is idle. By morning, the day-side bot is smarter than it was yesterday, with no perceived performance cost.

The rest of this page unpacks each component, then shows the implementation details — pruning loop, docker-compose, six learning loops — that make the picture above work.

🎯 The one thing they all share — the context window

ภาษาไทย

เครื่องมือทั้งห้า — LlamaIndex, STORM, NotebookLM, Wiki, Hermes — ต่างก็ทำสิ่งเดียวกัน นั่นคือ เลือกข้อความที่ถูกต้องมาใส่ในหน้าต่างบริบทของ LLM ในเวลาที่ถูกต้อง

หน้าต่างบริบท (context window) ของโมเดลมีขีดจำกัด เช่น Qwen 2.5:7b รับได้ ~32K tokens ทุกอย่างที่อยู่นอกหน้าต่างนี้ โมเดล "ลืม"

ความแตกต่างคือ: ใครเลือก? เลือกอย่างไร? ใครเขียนกลับ?

English

All five tools — LlamaIndex, STORM, NotebookLM, Wiki, Hermes — do one thing: put the right text into the LLM's context window at the right time.

The context window is finite (Qwen 2.5:7b ~32K tokens, Gemma 4 ~128K). Anything outside the window, the model has "forgotten."

What differs: who chooses what goes in? how is it chosen? who writes back?

中文

这五个工具——LlamaIndex、STORM、NotebookLM、Wiki、Hermes——做的是同一件事：在正确的时机把正确的文本放进 LLM 的上下文窗口。

上下文窗口是有限的（Qwen 2.5:7b 约 32K tokens，Gemma 4 约 128K）。任何在窗口之外的内容，模型都已"忘记"。

区别在于：谁来选择放入什么？怎么选？谁负责写回？

The context window is a budget. Each architecture is a different policy for spending it.

🦙 1. LlamaIndex — in-orchestrator RAG over your Wiki

🦙

LlamaIndex (replaces Khoj after evaluation)

local · markdown-native · in-process

Why not Khoj? The original guide put Khoj here. Trying it surfaced three real costs: (1) it needs Postgres+pgvector as a second container, (2) first-time setup requires UI clicks in /server/admin to register chat models and agents, (3) its anonymous-mode flag is a CLI argument not an env var, and it routes anonymous requests to a default user that is not the user you indexed your content under. For a 25-file wiki this is too much machinery. LlamaIndex in-process is ~120 lines and has none of those failure modes.

What it does: LlamaIndex is a Python framework for building RAG over your own documents. We use the slimmest possible subset — VectorStoreIndex for storage, SimpleDirectoryReader for ingest, and OllamaEmbedding as the embeddings provider. No LlamaIndex LLM layer (Ollama is called directly from the orchestrator for chat).

Why it's the day-to-day grounding layer: the orchestrator imports it, builds a vector index over /data/wiki/**/*.md at startup, persists the index to a named volume so subsequent boots load instantly, then on every chat call embeds the user message and pulls the top-K chunks to inject as system context. Citations come back as filename stems. The same Ollama instance that serves chat also serves embeddings (via nomic-embed-text, ~270 MB).

LlamaIndex lives inside the orchestrator process — no separate service. The vector index persists to a named Docker volume, so restarts don't re-embed.

What a LlamaIndex grounding call looks like

# docker/orchestrator/main.py — actual production code, abbreviated
from llama_index.core import (
    Settings, SimpleDirectoryReader, StorageContext,
    VectorStoreIndex, load_index_from_storage,
)
from llama_index.embeddings.ollama import OllamaEmbedding

def _build_or_load_index():
    Settings.embed_model = OllamaEmbedding(
        model_name="nomic-embed-text",
        base_url="http://host.docker.internal:11434",
    )
    Settings.llm = None  # Ollama is called directly from /chat

    if INDEX_PATH.exists() and any(INDEX_PATH.iterdir()):
        ctx = StorageContext.from_defaults(persist_dir=str(INDEX_PATH))
        return load_index_from_storage(ctx)        # instant on restarts

    docs = SimpleDirectoryReader(
        input_dir=str(WIKI_PATH), recursive=True,
        required_exts=[".md"],
    ).load_data()
    idx = VectorStoreIndex.from_documents(docs)
    idx.storage_context.persist(str(INDEX_PATH))   # write to volume
    return idx


def ground(question: str) -> Grounding | None:
    idx = _get_index()
    if idx is None: return None
    nodes = idx.as_retriever(similarity_top_k=3).retrieve(question)
    if not nodes: return None
    return Grounding(
        text="\n\n---\n\n".join(n.text[:900] for n in nodes),
        citations=[Path(n.metadata["file_path"]).stem for n in nodes],
    )

💡 The role LlamaIndex plays in the bot

LlamaIndex is the day-side knowledge layer. Every student turn flows through it: the orchestrator embeds the query, pulls top-K wiki passages, and injects them as a system note so Ollama can quote them. Because retrieval is in-process there's no extra container, no admin UI, no health-check theater. The wiki is mounted read-only into the orchestrator; the persisted vector index lives in a separate read-write volume so it survives restarts.

⚡ Speed vs. accuracy — pick your knob

RAG over a 25-file corpus with Ollama-served embeddings adds about 15–25 seconds per reply on CPU (a single Qwen 2.5 3B generation already takes 2s; the extra ~600 tokens of grounding context multiply token-by-token cost). That's not always worth it. The orchestrator exposes three knobs:

RAG_ENABLED=true|false — global on/off. Off by default.
RAG_TOP_K=3 — how many passages to retrieve. Lower = faster.
OLLAMA_NUM_CTX=2560 — context window. Smaller = faster but truncates grounding.

Three approaches to get the best of both worlds, in order of effort:

Route by intent. A keyword pre-check ("week", "lesson", "syllabus") decides whether to ground. A "hi" returns in 2s ungrounded; a "what is week 8 about" pays the grounding cost.
Pre-summarize the wiki. hermes-night writes one-paragraph summaries per page; index those instead of the full lessons. Cuts retrieved context by ~10× — trades some precision.
Switch to GPU. The CPU baseline assumes no acceleration. With NVIDIA + nvidia-container-toolkit, all three numbers — generation, embedding, retrieval — fall to a few hundred milliseconds.

📐 What happened to "RAG isn't needed"?

The earlier version of this guide argued against building a dedicated RAG layer — on the reasoning that "Khoj already IS RAG; adding LlamaIndex is parallel infrastructure". That was correct logic; the premise turned out to be wrong. Khoj's specific operational tax (Postgres, admin UI for chat-model+agent setup, CLI-only anonymous mode, user-scoped indexing that doesn't surface to anonymous queries) made the trade flip the other way. ~120 lines of LlamaIndex code beats ~3 hours of Khoj first-time setup, with the bonus that the index now ships in the same container as the chat logic — one process to debug.

The original three reasons against custom RAG still bite if you grow:

Retrieval recall isn't the bottleneck at small scale — curation is. RAG can't tell you which lesson is your best lesson; it just retrieves what matches.
Chunks are opaque. LlamaIndex helps here by returning file_path in node metadata — every citation maps to a real markdown file the teacher can open. Pre-summarization (above) makes the chunks more legible.
Wider corpora need real vector stores. If you grow past a few thousand documents, swap LlamaIndex's in-memory SimpleVectorStore for Chroma or Qdrant — the rest of the code is unchanged.

🌪 2. STORM — Wikipedia-style nightly content generation

🌪

STORM (Stanford OVAL Lab)

local · authoring · multi-perspective

What it does: STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective question asking) is an open-source pipeline that auto-writes Wikipedia-style long-form articles from a topic and a set of sources. It runs entirely against your local LLM (Ollama works fine) and produces grounded, structured, citation-rich drafts.

How it works (in four phases):

Perspective generation — Given the topic, STORM brainstorms several distinct expert perspectives (pedagogical theory, classroom practice, learner cognition, etc.).
Simulated expert dialogue — Each perspective is played by an LLM persona that asks questions; another LLM, grounded in the sources, answers them. This produces a rich Q&A corpus that no single prompt could elicit.
Outline drafting — STORM synthesizes the dialogue into a structured outline.
Section writing + polish — Each outline section is drafted from the dialogue + sources, then a final pass polishes the whole article.

Why it's the nightly authoring tool: writing a good lesson article is not a one-shot prompt. STORM's multi-perspective dialogue surfaces angles a single prompt misses — exactly what you want for a Week-N lesson that has to teach grammar, anticipate learner errors, and tie into cultural context. The output is a markdown article that drops straight into wiki/lessons/week-NN/ as a DRAFT.

STORM's multi-perspective dialogue is what makes its drafts feel researched, not regurgitated. The output goes straight into the Wiki as DRAFT.

What a STORM run looks like

# hermes-night/tools/storm_article.py — invoked by Hermes during nightly authoring
from knowledge_storm import STORMWikiRunner, STORMWikiRunnerArguments
from knowledge_storm.lm import OllamaClient
from pathlib import Path

def storm_article(topic: str, output_dir: Path) -> Path:
    """Generate a Wikipedia-style article on `topic`. Returns the article path."""
    lm = OllamaClient(model="hermes-4-35b-a3b:q4_k_m",
                     url="http://ollama:11434")
    args = STORMWikiRunnerArguments(
        output_dir=str(output_dir),
        max_conv_turn=5,         # Q&A turns per perspective
        max_perspective=4,      # number of expert voices
        max_thread_num=3,
    )
    runner = STORMWikiRunner(args, lm=lm, retriever="wiki-local")
    runner.run(
        topic=topic,
        do_research=True,           # run perspective Q&A
        do_generate_outline=True,
        do_generate_article=True,
        do_polish_article=True,
    )
    return output_dir / "storm_gen_article_polished.md"

💡 The role STORM plays in the bot

STORM is the nightly author. When Hermes is asked to build a new lesson, STORM produces the main grounded article — the dense, cited explanation of the topic. Hermes then takes that article and runs the specialized author tools (DSS-PPP plan, drill set, quiz, vocab cards) over it. The single STORM article becomes the gravity center for the whole lesson packet.

📓 3. NotebookLM — used exactly once at install, then disconnected forever

📓

NotebookLM (Google)

cloud · install-only · then offline

The role NotebookLM plays in this stack is unusual: it runs exactly once, during the install step (see Getting Started — Step 6d), to generate a rich starter pack of media from public reference works — audio overviews, video summaries, slide decks, quiz banks, flashcards, and mind maps — covering every week of your syllabus. Those artifacts get saved as static files in the Wiki. After that, the Docker host disconnects from the internet and never talks to NotebookLM again.

Why this pattern: NotebookLM's output quality (Audio Overview especially) is genuinely hard to match locally today, but continuously sending data to Google is at odds with the PDPA design rule. The compromise: pay the cloud trip once, on sources you'd share with anyone anyway, and walk away with a year of starter content. From day two onwards the bot is fully offline.

⛔ Hard rules — even for the one-time bootstrap

Public sources only. Published textbooks, ministry curricula, Council of Europe references — anything you could legally hand to a stranger. Never: student rosters, exam keys, transcripts, parent contacts, school memos.
After bootstrap, revoke credentials and cut the network. See Step 8a. The architectural goal is that NotebookLM cannot be called again, not that it merely shouldn't be.
Hermes does NOT keep NotebookLM as a runtime tool. Unlike LlamaIndex or STORM, there is no notebooklm_* tool in the agent's regular toolbox after install. The bootstrap script lives separately and is not invoked at night.

What gets generated, by week, by NotebookLM feature

Artifact	NotebookLM feature	Lands in	Used for
🎧 Audio overviews (~10 min)	Audio Overview	`wiki/audio/week-NN.mp3`	Listening practice
🎬 Video summaries (~5 min)	Video Overview	`wiki/video/week-NN.mp4`	Visual review, homework
📊 Slide-deck outlines	Study Guide	`wiki/slides/week-NN.md`	In-class presentation
📝 Quiz banks (~20 Q&A)	FAQ generation	`wiki/quizzes/week-NN.md`	Practice + exam prep
🎴 Flashcard sets	Briefing doc → term extraction	`wiki/flashcards/week-NN.json`	Vocabulary drilling (Anki)
🗺 Mind maps	Mind Map → SVG	`wiki/maps/week-NN.svg`	Visual concept reference

💡 Why front-loading all this works

Most of these artifacts (audio, video, slides, mind maps) don't need to be regenerated as the course evolves — the underlying grammar/CEFR descriptors don't change. The textual artifacts (quizzes, flashcards) can be extended locally over time using STORM. So the one-time bootstrap covers the parts that benefit most from NotebookLM's specific strengths, while STORM + LlamaIndex cover everything that needs to keep learning. After install, the system is self-sufficient.

The actual bootstrap script and its safety guards are in Step 6 of the Getting Started section below.

📚 4. Wiki — Karpathy's compounding knowledge base

📚

LLM Wiki (Karpathy pattern)

curated · transparent · self-editing

What it does: a folder of plain markdown files plus an INDEX.md that describes what's where (a schema, like CLAUDE.md for a codebase). When the user asks a question, the LLM reads INDEX.md first, decides which pages are relevant, fetches them, then answers. No embeddings, no vector DB.

The killer feature: the LLM can also write to the wiki. After each tutoring session, the bot appends an observation to the relevant page. The wiki is a compounding artifact — it gets smarter every day the bot is used.

The Wiki is a compounding artifact. Read by the bot, edited by the bot, curated by you.

What INDEX.md looks like

# Kru Eng Wiki — Index

## Reading rules
- Always read INDEX.md first, then load only pages you need.
- When teaching a lesson, also load students/.md for the student.
- After every session, append a dated note to students/.md.

## Pages
- syllabus/tefl_tech.md — full term curriculum, 12 weeks
- syllabus/dss_ppp_format.md — DSS-PPP lesson plan template
- lessons/week-03/ppp.md — Week 3 PPP (present perfect)
  - postmortem.md — what worked / what didn't (auto-appended)
- exams/midterm-b1/ — exam + answer key + rubric
- students/.md — per-student notes, level, weak areas
- pedagogy/ — Krashen, Byrne, Scrivener references

🤖 5. Hermes Agent — tool-calling orchestrator

🤖

Hermes 4 Agent

active · tool-calling · autonomous

What it does: Nous Research's Hermes 4 is a model fine-tuned heavily on agent traces — multi-step conversations where an LLM correctly invokes tools (functions). The Hermes Agent harness gives the model a tool registry it can call: read_wiki_page, append_wiki, storm_article, rebuild_index, prune_wiki, etc.

Hermes 4 35B A3B is MoE — 35B total params, only 3B active per token. Runs on a single RTX 4090 at Q4_K_M (~22 GB). It "stays in character as an agent" much longer than a base instruct model, which is what you need for a nightly maintenance worker.

Hermes is the autonomous worker. It calls the runtime tools (LlamaIndex, STORM, Wiki) as needed. NotebookLM is NOT a runtime tool — it ran once at install and is gone.

📊 Comparison — how they handle context

Dimension	🦙 LlamaIndex	🌪 STORM	📓 NotebookLM	📚 Wiki	🤖 Hermes
Role in the bot	Day-time Q&A	Nightly authoring	Install-time only (then disconnected)	Shared substrate	Night-time orchestrator
Where context lives	Reads wiki/ markdown directly	Generates from sources + Wiki	Google cloud (sources + index)	Plain markdown on disk	None — orchestrates the others
Curation	None — auto-indexes Wiki	Topic-driven; multi-perspective	You upload sources	LLM curates, teacher prunes	Picks which tool to consult
Citations	page-level (file paths)	in-text references	strict, paragraph-level	page-level	Inherits from underlying tool
Write-back / self-edit	no — read-only	writes DRAFT articles	no	yes — appends notes	yes — drives all writes
Scale ceiling	Thousands of pages	One article per run (~2K words)	~300 sources / notebook	~400K words / ~100 pages	Unlimited (delegates)
Runs locally?	yes — Docker + Ollama	yes — Python + Ollama	no — needs Google	yes	yes (Hermes 4)
PDPA-safe with student data?	yes	yes	no — public sources only	yes	yes
Best at	Fast grounded Q&A on a teacher's corpus	Multi-perspective long-form drafts	Audio Overviews of published books	Curated, evolving knowledge	Multi-step night jobs
Worst at	Writing new long-form content	Quick, interactive answers	Anything involving private data	Scale beyond ~hundreds of pages	Knowing facts on its own

🔑 The key insight

These are not competitors — they are five different layers of the same problem. LlamaIndex retrieves grounded passages in real time. STORM drafts long-form lesson articles overnight. NotebookLM is reserved for the one thing only it does well — Audio Overviews on public references. Wiki is the curated, human-readable layer everything else reads from and writes to. Hermes is the night manager that decides which of the others to call. A real production bot uses all five, each in its proper role.

🛠 Implementation — pruning, docker, scheduler

This section unpacks the moving parts behind the day/night architecture shown at the top of the page: how the night worker prunes the Wiki without losing teacher-authoritative content, the merged docker-compose stack (orchestrator with LlamaIndex + owner-bridge + night worker), and the Hermes agent entry point.

The pruning loop — how the Wiki stays healthy

"Append-only" wikis rot fast. The night worker's pruning pass is what stops the Wiki from drowning in stale notes, duplicate observations, and contradictions accumulated across hundreds of student sessions. It runs as a deterministic loop over every page, with Hermes scoring each one and proposing an action — but every destructive change goes through git and a teacher confirmation gate.

Five signals feed Hermes' scoring; four possible actions; everything reversible via git. Delete needs teacher confirmation.

🔒 Safety rails baked into the loop

Pin via frontmatter — any page with status: pin is skipped entirely (syllabus, exam answer keys, ministry-mandated content). Three-tier reversibility — archive is just a move, merge keeps the source in git history, delete writes to .prune-queue.jsonl and waits for the morning email's "approve" link. Contradictions never auto-resolve — Hermes flags them and leaves both pages intact so you decide which is authoritative.

Merged docker-compose.yml — full stack with the night worker

This is the actual production docker/docker-compose.yml from the krueng.ai repo, as of May 2026. Four runtime services (orchestrator, owner-bridge, tts, whisper) plus a profile-gated night worker. Host Ollama is reached via host.docker.internal through docker-compose.override.yml, so we don't run a second Ollama in a container.

# docker/docker-compose.yml — production stack

services:
  ollama:
    image: ollama/ollama:latest
    container_name: krueng-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    # Override file disables this in favour of host-native Ollama.
    # GPU stanza (uncomment with nvidia-container-toolkit):
    # deploy: { resources: { reservations: { devices: [{driver: nvidia, count: all, capabilities: [gpu]}] }}}

  whisper:
    image: onerahmet/openai-whisper-asr-webservice:latest
    container_name: krueng-whisper
    environment:
      ASR_MODEL: ${WHISPER_MODEL:-small}
      ASR_ENGINE: faster_whisper
    ports:
      - "9000:9000"
    restart: unless-stopped

  tts:
    build: ./tts
    container_name: krueng-tts
    ports:
      - "8001:8001"
    volumes:
      - tts_cache:/root/.local/share/tts
      - ./voices:/voices:ro
    environment:
      DEVICE: ${TTS_DEVICE:-cpu}
      XTTS_SPEAKER: ${XTTS_SPEAKER:-Ana Florence}
    restart: unless-stopped

  # ── DAY-SIDE BOT: chat + LlamaIndex RAG in one process ────────────
  orchestrator:
    build: ./orchestrator
    container_name: krueng-orchestrator
    ports:
      - "8000:8000"
    environment:
      OLLAMA_URL: http://ollama:11434          # overridden to host.docker.internal in override file
      WHISPER_URL: http://whisper:9000
      TTS_URL: http://tts:8001
      MODEL: ${MODEL:-qwen2.5:3b}              # fast 3B default for CPU hosts
      RAG_ENABLED: ${RAG_ENABLED:-false}       # opt-in: trades 15-25s/reply for grounding
      EMBED_MODEL: ${EMBED_MODEL:-nomic-embed-text}
      WIKI_PATH: /data/wiki
      INDEX_PATH: /data/wiki_index
      RAG_TOP_K: ${RAG_TOP_K:-3}
      OLLAMA_NUM_PREDICT: ${OLLAMA_NUM_PREDICT:-180}
      OLLAMA_NUM_CTX: ${OLLAMA_NUM_CTX:-1536}  # auto-bumps to 2560 if RAG_ENABLED
    volumes:
      - wiki:/data/wiki:ro                     # read-only mount of the markdown corpus
      - wiki_index:/data/wiki_index            # persisted LlamaIndex VectorStoreIndex
    depends_on:
      - ollama
      - whisper
    restart: unless-stopped

  # ── OWNER CONTROL: LINE/Slack webhooks → in-process Hermes tool loop ──
  owner-bridge:
    build: ./owner-bridge
    container_name: krueng-owner-bridge
    ports:
      - "8082:8082"
    environment:
      OLLAMA_URL: http://ollama:11434
      HERMES_MODEL: ${HERMES_MODEL:-hermes3:8b}  # Hermes 4 not on Ollama yet; hermes3 stands in
      WIKI_PATH: /data/wiki
      NIGHT_QUEUE: /data/queue/night_jobs.jsonl
      LINE_CHANNEL_ACCESS_TOKEN: ${LINE_CHANNEL_ACCESS_TOKEN:-}
      LINE_CHANNEL_SECRET: ${LINE_CHANNEL_SECRET:-}
      OWNER_LINE_ID: ${OWNER_LINE_ID:-}
      SLACK_BOT_TOKEN: ${SLACK_BOT_TOKEN:-}
      SLACK_SIGNING_SECRET: ${SLACK_SIGNING_SECRET:-}
      OWNER_SLACK_USER_ID: ${OWNER_SLACK_USER_ID:-}
    volumes:
      - wiki:/data/wiki                  # reads + writes (owner promotes DRAFTs)
      - night_queue:/data/queue
    depends_on:
      - ollama
    restart: unless-stopped
    # Webhooks come INBOUND through Cloudflare Tunnel.
    # No outbound polling: LINE reply-tokens and Slack chat.postMessage on the same socket.
    # Custom ~140-line Ollama tool-calling loop — no off-the-shelf hermes_agent package.

  # ── NIGHT WORKER (cron-triggered) ──────────────────────────────────
  # Run with: docker compose run --rm hermes-night
  hermes-night:
    build: ./hermes-night
    container_name: krueng-hermes-night
    profiles: ["night"]   # excluded from `up -d`; cron triggers it
    environment:
      OLLAMA_URL: http://ollama:11434
      HERMES_MODEL: ${HERMES_MODEL:-hermes3:8b}
      WIKI_PATH: /data/wiki
      NIGHT_QUEUE: /data/queue/night_jobs.jsonl
      TRANSCRIPT_PATH: /data/transcripts
      REPORT_EMAIL: ${REPORT_EMAIL:-}
      SMTP_HOST: ${SMTP_HOST:-}
      SMTP_PORT: ${SMTP_PORT:-587}
      SMTP_USER: ${SMTP_USER:-}
      SMTP_PASS: ${SMTP_PASS:-}
    volumes:
      - wiki:/data/wiki                  # night-side reads + writes
      - night_queue:/data/queue
      - transcripts:/data/transcripts
    depends_on:
      - ollama

volumes:
  ollama_data:
  tts_cache:
  wiki:            # shared: orchestrator reads, owner-bridge + hermes-night read+write
  wiki_index:      # LlamaIndex's persisted VectorStoreIndex — survives restarts
  night_queue:
  transcripts:

🛠 Two operational details worth noting

profiles: ["night"] keeps hermes-night out of docker compose up -d, so it never runs as a service. Host cron launches it with docker compose run --rm hermes-night, which exits when the maintenance script returns.
The wiki volume is mounted :ro on the orchestrator — the day-side bot can never accidentally corrupt the knowledge base. Only the night worker has write access, and every change goes through git inside the volume.

The night worker entry point

# hermes-night/run_night.py
from hermes_agent import Agent
from tools import (
    read_wiki_page, append_wiki, prune_wiki,
    storm_article, rebuild_index,
    read_transcripts, email_report,
)

agent = Agent(
    model="hermes-4-35b-a3b:q4_k_m",
    base_url="http://ollama:11434/v1",
    tools=[
        read_wiki_page, append_wiki, prune_wiki,
        storm_article, rebuild_index,
        read_transcripts, email_report,
    ],
)

goal = """
It is 2 AM. Run the nightly Kru Eng maintenance:
1. Read today's session transcripts (read_transcripts).
2. For each student mentioned, append a dated note to their wiki page
   summarizing what they practiced and where they struggled.
3. If any new lesson topic was requested by the teacher, run
   storm_article() to draft a Wikipedia-style article and save it
   to wiki/lessons/<week>/ as status: DRAFT.
4. Prune any wiki pages last touched more than 90 days ago if they
   duplicate newer content (prune_wiki, archive not delete).
5. Call rebuild_index() so the day-side bot sees today's edits.
6. Email kru@krueng.ai a 5-bullet summary of what changed.
"""

agent.run(goal)

✏️ Generating course content — the same stack, run as an author

ภาษาไทย

สถาปัตยกรรมแบบเดียวกันนี้ ใช้สร้าง เนื้อหาบทเรียน ได้ — ไม่ใช่แค่ตอบคำถาม

กระบวนการ: ครูระบุหัวข้อ → Hermes วางแผน → STORM เขียนบทความอ้างอิงจาก Wiki → Hermes ร่างแผนการสอน DSS-PPP, แบบฝึกหัด, ข้อสอบ → เขียนลง Wiki ให้ครูตรวจ

English

The same stack runs the other direction — instead of answering student questions, it writes new course materials.

Flow: teacher names a topic → Hermes plans → STORM grounds the topic from existing Wiki sources → Hermes drafts a DSS-PPP lesson plan, exercises, exam questions → posts to the Wiki for human review.

中文

同一套技术栈反向运行——不再回答学生的问题，而是撰写新的课程材料。

流程：老师指定主题 → Hermes 制定计划 → STORM 基于现有 Wiki 资料对主题进行多视角研究 → Hermes 起草 DSS-PPP 教案、练习、考题 → 提交到 Wiki 等待人工审核。

A single Hermes invocation produces a DSS-PPP plan, drill set, quiz, and vocab cards — all from one cited source brief.

Concrete example — "Week 4: Past Perfect for B1 Thai learners"

# hermes-night/run_authoring.py — invoked manually or via cron
from hermes_agent import Agent
from tools import (
    storm_article, wiki_search, paperqa_lookup,
    draft_dss_ppp, generate_drills, generate_quiz,
    generate_vocab_cards, synthesize_xtts_audio,
    write_wiki_page, generate_flashcard_html,
)

agent = Agent(
    model="hermes-4-35b-a3b:q4_k_m",
    base_url="http://ollama:11434/v1",
    tools=[
        storm_article, wiki_search, paperqa_lookup,
        draft_dss_ppp, generate_drills, generate_quiz,
        generate_vocab_cards, synthesize_xtts_audio,
        write_wiki_page, generate_flashcard_html,
    ],
)

agent.run("""
Build a complete Week 4 lesson packet:
  topic: past perfect tense
  level: CEFR B1
  audience: Thai high-school learners, ages 14-16
  duration: 45 minutes
  format: DSS-PPP (Warmer · Presentation · Practice · Production)

Steps:
  1. Ground the topic via storm_article() — runs multi-perspective research and writes
     a Wikipedia-style article to wiki/lessons/week-04/article.md.
  2. Draft the DSS-PPP plan. Each stage must name a concrete activity
     and a CEFR can-do statement.
  3. Generate 26 drill items across gap-fill, transformation, error-correction,
     and pair dialogue formats.
  4. Generate a 10-MCQ + 3-short-answer quiz with rubric.
  5. Extract 12 target vocabulary items (English + Thai gloss + IPA),
     synthesize XTTS audio, and emit a flashcard HTML page using the
     tefl-bilingual.css pattern from CLAUDE.md.
  6. Write everything to wiki/lessons/week-04/ as separate markdown files
     with status: DRAFT in the frontmatter.
  7. Email kru@krueng.ai a summary with links to each artifact.
""")

What the bot can generate today, with no extra training

Artifact	Source layer	Author layer	Output format
📋 DSS-PPP lesson plan	STORM article + Wiki sources	Hermes drafts to template	markdown in `wiki/lessons/`
🎯 Practice drills	Target grammar/vocab from brief	Hermes generates · varied formats	markdown + answer key
📝 Quiz + answer key + rubric	Brief + CEFR descriptors	Hermes drafts · self-scores test set	markdown + HTML student version
🎴 Vocab flashcards (bilingual)	Word list from brief	Hermes glosses · XTTS audio	HTML using `tefl-bilingual.css`
🎧 Listening practice audio (week packs)	Pre-generated at install via NotebookLM (Step 6)	Lives in `wiki/audio/week-NN.mp3`	mp3 + transcript
🎙 Listening practice audio (ad-hoc)	STORM article excerpt + XTTS	Hermes calls `synthesize_xtts_audio()` on demand	mp3 saved to `wiki/audio/ad-hoc/`
🎬 Video script / storyboard	Brief + style guide	Hermes drafts · HeyGen-ready	JSON for `generate_heygen.py`
📖 Reading text + comprehension Qs	Vocab + level + brief	Hermes writes leveled passage	markdown · 200-400 words
💬 Conversation activity	Target grammar + cultural context	Hermes drafts roleplay cards	markdown · printable cards
🎮 RPG quest (Land of Suvarna)	Vocab list from brief	Hermes drafts NPC dialogue, items	JS objects for `rpg-data.js`
📊 Whole syllabus week	Curriculum doc + prior weeks	Chain of all above tools	Full `wiki/lessons/week-NN/` tree

🎯 The big unlock

You already have generate_tefl_*.py scripts that produce content from one-off prompts. This architecture turns those scripts into tools the agent can compose. Instead of you running five scripts in sequence and stitching the outputs together, you tell the bot "build Week 4" and it picks the tools, runs them in order, grounds each step in cited sources, and posts a coherent lesson packet to the Wiki for your morning review.

The teacher's job shifts from authoring to editing — and the corpus compounds: every lesson you ship enriches the Wiki, which makes the next lesson easier to ground.

⚠️ Keep a human in the loop

Generated content always lands in the Wiki as status: DRAFT. Never auto-publish to students. The bot writes; you read, edit, and promote. The Wiki history (git) means you can see exactly what the bot proposed vs. what you shipped — a great training signal for tuning the prompts over time.

🔄 How the system learns — six loops, six drivers

ภาษาไทย

ระบบนี้ "เรียนรู้" ใน หกวงจร ที่แตกต่างกัน แต่ละวงจรมี คนขับ ของตัวเอง — บางอันขับโดยนักเรียน บางอันโดยครู บางอันโดยตัวบอทเอง

ถ้าไม่เข้าใจว่าวงจรไหนทำงานอย่างไร เมื่อบอททำผิด คุณจะไม่รู้ว่าจะแก้ที่ไหน

English

This system "learns" in six distinct loops. Each loop has its own driver — some are powered by the student, some by the teacher, some by the bot itself, and each runs on a different clock.

If you don't know which loop is which, then when the bot behaves wrong you won't know where to fix it.

中文

本系统通过六个不同的循环来"学习"。每个循环都有自己的驱动者——有些由学生驱动，有些由老师驱动，有些由机器人自己驱动，每个循环的节奏也不同。

如果你分不清是哪个循环，那么当机器人出错时，你也就不知道该从哪里修复。

Six independent learning loops, all writing back to the Wiki at different speeds. The Wiki is the integration point.

The six loops in one table

#	Loop	What drives it	How often	What signal it learns from	What changes
1	Live conversation	👨‍🎓 Student speaks	Every turn (seconds)	The current dialogue	In-session memory only — chat history in the LLM context. Forgotten when session ends unless loop 2 captures it.
2	Session memory	🤖 Hermes summarizes	End of each session	Whisper transcript of the session	Appends a dated note to `students/<name>.md`. Next session starts with that memory in context.
3	Source ingestion	👩‍🏫 Teacher drops a file	When new material arrives (weeks)	External PDFs, books, videos in `/incoming/`	STORM produces a cited article from existing Wiki sources → new page in `wiki/lessons/`. Future questions can ground in it.
4	Pruning	🤖 Hermes (cron at 2:30 AM)	Nightly	Age + usage + dedup + contradiction signals	Wiki pages keep / merge / archive / delete-pending. Vector index re-embedded for changed pages.
5	Course authoring	👩‍🏫 Teacher names a topic	On demand (weekly)	The topic + existing Wiki + cited sources	Bot writes DRAFT lesson packets to `wiki/lessons/week-NN/`. Becomes live only after loop 6.
6	Human review	👩‍🏫 Teacher edits DRAFTs	Each morning	The git diff between bot draft and approved version	DRAFT promoted to live. The diff is the strongest training signal in the whole system — it tells the bot exactly what it got wrong.
↺	Implicit prompt tuning	👩‍🏫 Teacher (cumulative)	Months	Patterns visible across loops 2 + 4 + 6	Teacher updates `SYSTEM_PROMPT`, `INDEX.md` rules, or tool definitions. Never auto-applied.

🔑 Who actually drives the learning

Look at the "driver" column. Three loops are teacher-driven (3, 5, 6 + the implicit one). Two are bot-driven (2, 4). One is student-driven (1). The teacher's signal is the smallest in volume but the highest in quality — it's the only loop that decides what becomes truth. The bot's loops are mechanical: they capture and curate, but they never promote DRAFT to live and they never rewrite the system prompt. This split is intentional — it's what keeps the bot useful instead of confidently wrong.

⚠️ Where the bot can NOT learn (yet)

Nothing in this stack updates the model weights. Qwen 2.5 and Hermes 4 are frozen — every loop above is context engineering, not fine-tuning. If the bot keeps getting something wrong even after you fix the Wiki, the fix is either (a) better system prompt, (b) better INDEX.md, (c) a new tool, or (d) eventually a LoRA on top of the base model. Don't confuse loops 1–6 with model training.

Diagnosing "the bot got something wrong"

Symptom	Which loop failed	Fix
Bot forgot a student's level from last week	Loop 2 — session memory wasn't captured	Check Hermes ran post-session; verify `students/<name>.md` was appended
Bot cites a textbook you haven't uploaded	Loop 3 didn't fire — bot hallucinated	Tighten system prompt: "cite only from Wiki pages you actually read"
Bot still uses an old definition you corrected	Loop 4 — stale page wasn't pruned	Mark the new page authoritative, the old one `status: archive`
Generated lesson plan misses your DSS-PPP timings	Loop 5 — author tool prompt is wrong	Update `draft_dss_ppp` tool template; review the next DRAFT
Bot repeats the same teaching mistake across many lessons	Loop ↺ — pattern not yet promoted to system prompt	Encode the rule in `SYSTEM_PROMPT` or `INDEX.md` after seeing it 3+ times

🚀 Getting started — pull Docker, prepare your school docs, then bring the bot online

ภาษาไทย

ส่วนนี้พาคุณจาก "เครื่องเปล่า" ไปจนถึง "บอทพร้อมสอนพรุ่งนี้" ภายในประมาณ 1 ชั่วโมง — โดยเริ่มจาก Docker, จากนั้นเตรียมเอกสารของโรงเรียน (ข้อมูลโรงเรียน · ครู · นักเรียน · ความคาดหวังในชั้นเรียน), แล้วค่อยเปิดบอท

หัวใจของการเริ่มต้นไม่ใช่เทคนิค แต่คือ เอกสารที่คุณเตรียม — บอทจะดีหรือไม่ดีขึ้นอยู่กับสิ่งที่คุณป้อนตอนแรก

English

This section walks you from "fresh machine" to "the bot can teach tomorrow" in about an hour. The order is deliberate: pull Docker first, then prepare the school documents (school profile · staff · students · lesson expectations), then bring the bot online and import everything.

The most important step is not technical — it's the documents you prepare in Step 2. The bot will be exactly as useful, considerate, and aligned with your teaching as the docs you feed it.

中文

本节带你从"全新机器"到"明天就可以上课"，约需 1 小时。顺序是经过精心安排的：先拉取 Docker，然后准备学校文档（学校简介 · 教职员 · 学生 · 课堂期望），最后才让机器人上线并导入所有内容。

最关键的一步不是技术——而是你在第 2 步准备的文档。机器人会有多好用、多体贴、多贴合你的教学，完全取决于你最初喂给它的资料。

✅ Preflight — hardware check (60 seconds)

Before pulling anything, confirm your machine clears the bar:

Minimum: 16 GB RAM, 50 GB free disk, modern CPU (i5 8th gen / Ryzen 5 / Apple M1). Runs Qwen 7B for both day and night.
Recommended: 32 GB RAM, 200 GB disk, NVIDIA 12+ GB VRAM or Apple Silicon with 32+ GB unified memory. Adds Hermes 4 35B for the night worker.
OS: Windows 10/11, macOS 12+, or Ubuntu 22.04+. Linux is fastest.

No GPU? Skip Hermes 4 — the architecture still works with Qwen 7B doing both roles. Add a GPU later and swap HERMES_MODEL in .env.

The full onboarding flow. Step 2 (prepare docs) gets dedicated time because everything downstream — what the bot grounds in, what LlamaIndex retrieves, what STORM generates — depends on it.

Step 1 — Pull Docker and bring up the bot stack

This is two things in one: install the Docker engine if you don't have it, then pull the bot images and start them. By the end of this step docker compose ps shows five services running and you have an idle bot waiting for documents.

1a. Install Docker (if you don't have it)

OS	What to install	Get it from
Windows 11	Docker Desktop + WSL2 backend	docs.docker.com/desktop/install/windows-install
macOS (Intel or Apple Silicon)	Docker Desktop	docs.docker.com/desktop/install/mac-install
Ubuntu/Debian	Docker Engine + Compose plugin	`curl -fsSL https://get.docker.com \| sh`
NVIDIA GPU (any OS)	NVIDIA Container Toolkit (after Docker)	NVIDIA docs

# Confirm Docker is healthy
docker --version          # expect Docker version 24+ or 25+
docker compose version    # expect Compose v2.x
docker run --rm hello-world

# GPU users — confirm CUDA is reachable from containers
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

1b. Pull the bot stack and bring it up

git clone https://github.com/your-org/kru-eng-bot.git
cd kru-eng-bot/docker

cp .env.example .env
# Edit .env minimum settings:
#   MODEL=qwen2.5:7b              (use qwen2.5:3b on 16 GB RAM)
#   WHISPER_MODEL=small            (use base on a CPU-only box)
#   TTS_DEVICE=cpu                 (cuda once GPU is wired)
#   TEACHER_EMAIL=you@school.ac.th
#   SMTP_HOST=smtp.gmail.com:587   (for the morning report)

docker compose up -d --build
# first build pulls torch + Coqui + Piper voice (~2 GB)
# takes 5-10 minutes — go to Step 2 while it builds

docker compose ps
# expect: ollama, whisper, tts, orchestrator, owner-bridge
#         hermes-ui, owner-bridge — all "running"

1c. What just came up — the URLs you'll use

Service	URL	For who	Purpose
🎙 Orchestrator	`http://localhost:8000`	Students	The voice chat web page
🦙 LlamaIndex	`http://localhost:42110`	You (debug)	Search the wiki directly, see citation paths
🤖 Hermes Agent Web UI	`http://localhost:8081`	You (owner)	Chat with Hermes, run ad-hoc tools, view night-job history
🌉 Owner bridge	`http://localhost:8082/webhook`	LINE / Slack	Webhook receiver — wired in Step 8c/8d
🦙 Ollama	`http://localhost:11434`	Internal	Model server — not for direct human use

💡 The Hermes Web UI is your console

The Hermes Agent Web UI at localhost:8081 is how you, the owner, talk to the same agent that runs every night. From it you can: ask Hermes to draft a Week 5 lesson on demand, inspect what tools it called last night, replay a failed night run, approve pending DRAFT pages. It's your primary control panel — and unlike students who get voice-only, you get the full agent chat with tool visibility.

💡 You can start Step 2 right now

The compose build runs in the background. Don't wait for it — open your notebook and start drafting the docs in Step 2 below. By the time you finish writing, the build will be done.

Step 2 — Prepare your school documentation

This is the most important step in the whole install. Spend 30 minutes here even if you're impatient to see the bot running. The documents you prepare now decide what the bot knows, how it talks to your students, and how it grades work. Skimp here and the bot will produce generic LLM output. Invest here and it becomes your teaching assistant.

2a. The document checklist — what to gather and what to write

Document	What goes in it	Will live at	Have it?
🏫 School profile	School name, type, mission, age range, languages taught, term dates, contact info	`wiki/school/profile.md`	Write fresh (15 min)
📋 Lesson expectations most leverage	DSS-PPP timings, classroom rules, error-correction style, homework policy, feedback approach, cultural norms	`wiki/pedagogy/expectations.md`	Write fresh (15 min)
👩‍🏫 Staff roster	Per teacher: name, role, subjects, contact, teaching style, specialties	`wiki/staff/<name>.md`	Write fresh (5 min each)
🎓 Student roster	Per student: nickname, Thai name, age/grade, CEFR level, goals, strong/weak areas, parent contacts	`wiki/students/<name>.md`	From your records
📅 Syllabus	Term-by-term curriculum with CEFR can-do statements, weekly topics	`wiki/syllabus/<term>.md`	From your existing plan
📚 Existing lesson plans	Past lessons in any format (.docx, .pptx, PDF) — these become starter content	`wiki/lessons/imported/`	Gather files
📝 Past exams + keys	Exam questions, answer keys, rubrics, postmortems	`wiki/exams/<id>.md`	Gather files
🎨 Pedagogical references	Frameworks you follow (PPP, CEFR, TBL, lexical approach…)	`wiki/pedagogy/`	Write or cite

🎯 Write the Lesson Expectations doc first — it shapes everything else

If you only have 10 minutes for this step, spend them on wiki/pedagogy/expectations.md. That document tells the bot how you teach: your timing preferences, your error-correction style, your homework expectations, how strict to be, how feedback is given. Every lesson Hermes ever drafts, every reply LlamaIndex ever supports, will be informed by this file. It's the highest-leverage artifact in the wiki.

2b. Template — School profile

Copy this into wiki/school/profile.md and fill the brackets:

---
status: pin
type: school
updated: 2026-05-11
---

# <School name>

## Profile
- **Type:** <private language school / public primary / international / corporate training>
- **Location:** <city, country>
- **Age range:** <e.g. 8-18>
- **Languages taught:** <English, Mandarin, …>
- **Term system:** <2-term / 3-term / semester>
- **Typical class size:** <e.g. 8 students>
- **Class duration:** <e.g. 45 minutes>

## Mission
<1-2 paragraphs — what the school is for, the educational philosophy,
what success looks like for a graduating student>

## Term dates (current academic year)
- **Term 1:** 2026-04-01 → 2026-06-30
- **Term 2:** 2026-07-15 → 2026-10-30
- **Term 3:** 2026-11-15 → 2027-02-28
- **Holidays:** <list cultural / religious / national holidays the school observes>

## Contact
- **Director:** <name, email, phone>
- **Administration:** <email>
- **Address:** <address>

2c. Template — Lesson Expectations (the keystone doc)

Copy this into wiki/pedagogy/expectations.md:

---
status: pin
type: pedagogy
updated: 2026-05-11
---

# Lesson Expectations — How <School name> teaches English

## Class structure (DSS-PPP variant)
- **Lesson length:** <45 / 60 / 90 minutes>
- **Warmer:** <duration · type of activity · energy goal>
- **Presentation:** <duration · board-work expectations · modeling style>
- **Practice:** <duration · drill format mix · error-correction frequency>
- **Production:** <duration · pair vs group preference · freer activity types>
- **Cooldown / wrap:** <how lessons end · homework setup · feedback moment>

## Classroom rules
1. <e.g. English-only during Practice and Production stages>
2. <e.g. Phones in a basket at the door>
3. <e.g. Mistakes are welcomed — they teach us>
4. <…>

## Error-correction style
- **Pronunciation:** <delayed / immediate / record + replay>
- **Grammar:** <delayed feedback after Production, with examples on board>
- **Vocabulary:** <model-and-repeat, then move on>
- **What you NEVER correct:** <e.g. accent, dialect features, code-switching during freer practice>

## Homework policy
- **Frequency:** <one short task per lesson / weekly project>
- **Format:** <written · audio recording · Anki cards · journal>
- **Submission:** <when / how / late policy>
- **How the bot can help students:** <e.g. the bot may explain and quiz,
  but never write the answer for the student>

## Feedback style
- **Praise:** <specific, immediate, what they did well>
- **Surfacing weaknesses:** <gently, one concrete next step at a time>
- **Reports to parents:** <weekly summary email, monthly progress report, …>
- **Tone:** <warm but direct / encouraging but honest>

## Cultural expectations
- <Anything specific to your school / region / students>
- <e.g. Buddhist holidays as reading topics, respectful address forms,
  Thai cultural references the bot should know and use>
- <e.g. Avoid topics: alcohol, gambling, politics>

## Things the bot MUST NOT do
- <e.g. Never assign homework that requires a parent's English>
- <e.g. Never compare students to each other>
- <e.g. Never grade work that wasn't reviewed by the teacher>

2d. Template — Staff member

---
status: pin
type: staff
updated: 2026-05-11
---

# <Name> — <Role>

- **Email:** <email>
- **Phone:** <phone>
- **Subjects:** <e.g. English (B1-B2), TOEIC prep>
- **Specialties:** <e.g. exam strategy, pronunciation, creative writing>
- **Start date:** <date>
- **Languages:** <e.g. Thai (native), English (C2), Mandarin (B1)>

## Teaching style
<1-2 paragraphs — how this teacher teaches, what works for them,
what students of theirs say>

## Notes
<Anything else the bot should know — schedule preferences, classes
this teacher covers, how to reach them in an emergency>

2e. Template — Student profile

---
status: live
type: student
updated: 2026-05-11
---

# <Nickname> (<Full name in Thai>)

- **Level:** <A2 / B1 / B2 / C1>
- **Age / grade:** <e.g. 14 / M.3>
- **Started:** <2026-04-01>
- **Strong at:** <listening, reading comprehension>
- **Weak at:** <present perfect, prepositions of time, /θ/ pronunciation>
- **Goals:** <pass IELTS 5.5 by 2026-12 / improve speaking confidence>
- **Parent contact:** <name, email, phone — for weekly reports>
- **Special considerations:** <dyslexia, hearing accommodation, shy in group settings>

## Background
<1-2 sentences — family context, prior English exposure, motivation,
hobbies that connect to English (anime, K-pop, football)>

## Session log
(appended automatically by the night worker after every session)

2f. Template — Syllabus skeleton

---
status: pin
type: syllabus
term: 2026-term-1
updated: 2026-05-11
---

# Term 1, 2026 — B1 English Syllabus

## Term goal
<What students will be able to do by end of term — CEFR can-do statement>

## Weekly topics
| Week | Date | Grammar focus | Vocab theme | CEFR can-do |
|------|------|---------------|-------------|-------------|
| 1 | 2026-04-01 | Present simple vs continuous | Daily routines | "I can describe my daily routine in detail" |
| 2 | 2026-04-08 | Past simple | Last weekend | "I can describe a past event" |
| 3 | 2026-04-15 | Present perfect | Life experiences | "I can talk about my experiences" |
| 4 | 2026-04-22 | Past perfect | Storytelling | "I can talk about events before another past event" |
| … | | | | |
| 12 | 2026-06-24 | Review + assessment | All | Term-end exam |

## Assessment
- Weekly: short quiz (5 min, end of lesson)
- Mid-term: 2026-05-20 (B1 mock test)
- Final: 2026-06-30 (B1 mock test + speaking interview)

## Required materials
- <textbook, workbook, Cambridge B1 handbook…>

💡 Don't aim for perfection here

You'll revise these docs continuously as the bot starts using them and you see what's missing. Get a v1 down in 30 minutes — even rough notes — and improve them over the first few weeks. Empty is worse than imperfect.

Step 3 — Bootstrap the Wiki directory structure

Create the seed layout inside the wiki Docker volume. This is the skeleton the documents from Step 2 slot into.

wiki/
├── INDEX.md                       # the schema file — read first by every tool
├── school/
│   └── profile.md                 # from Step 2b
├── staff/
│   ├── _template.md
│   └── <name>.md                  # from Step 2d, one file per teacher
├── students/
│   ├── _template.md               # from Step 2e
│   └── <nickname>.md              # one file per student
├── pedagogy/
│   ├── expectations.md            # from Step 2c — the keystone doc
│   ├── dss-ppp.md                 # PPP framework reference
│   └── cefr-b1-can-do.md
├── syllabus/
│   └── 2026-term-1.md             # from Step 2f
├── lessons/
│   ├── imported/                  # bulk-converted past plans (Step 4a)
│   └── week-01/
│       ├── article.md             # STORM-style long-form article
│       ├── plan.md                # the DSS-PPP plan
│       └── drills.md
├── exams/
│   ├── _template.md
│   └── <id>.md
├── sources/                       # cited public references (textbooks, CEFR docs)
│   └── _readme.md
├── audio/                         # filled by Step 6 (NBLM bootstrap)
├── video/                         # filled by Step 6
├── slides/                        # filled by Step 6
├── quizzes/                       # filled by Step 6
├── flashcards/                    # filled by Step 6
├── maps/                          # filled by Step 6
└── _archive/                      # pruned pages land here, never deleted

The single most important file is INDEX.md. It's what LlamaIndex reads first to know what's in the wiki, and what Hermes reads to decide how to maintain it:

# INDEX.md — Kru Eng Wiki schema

## How to use this wiki
1. Read INDEX.md first (this file).
2. Always read pedagogy/expectations.md early — it defines how this school teaches.
3. When teaching, also load students/<name>.md for the student in front of you.
4. After every session, append a dated note to students/<name>.md summarizing
   what was practiced and where the student struggled.
5. Lesson DRAFTs live in lessons/week-NN/ with status: DRAFT frontmatter.
   Promote to status: live only after a teacher review.
6. Never modify files under sources/ — they are cited references, not curated notes.

## Page types
- school/profile.md — about the school itself (one file)
- pedagogy/expectations.md — how this school teaches (THE keystone doc)
- pedagogy/<topic>.md — teaching framework references (DSS-PPP, CEFR descriptors)
- staff/<name>.md — per-teacher profile
- students/<name>.md — per-student notes (level, weak areas, session log)
- syllabus/<term>.md — the course outline for one term
- lessons/week-NN/article.md — STORM-generated long-form article on the week's topic
- lessons/week-NN/plan.md — DSS-PPP lesson plan
- lessons/week-NN/drills.md — practice items + answer keys
- exams/<id>.md — exam questions, answer key, rubric, postmortem
- sources/<name>.md — extracted text from a cited reference work
- audio|video|slides|quizzes|flashcards|maps/week-NN.<ext> — pre-generated media

## Frontmatter contract
Every page MUST start with YAML frontmatter:
  ---
  status: live | DRAFT | pin | archive
  type: school | staff | student | pedagogy | syllabus | lesson | exam | source
  updated: 2026-05-11
  ---
- status: pin   — never auto-prune (school, staff, expectations, syllabus, exam keys)
- status: DRAFT — bot-authored, awaiting teacher review
- status: live  — approved, in active use
- status: archive — kept for history, not retrieved by LlamaIndex

Step 4 — Import your documents into the Wiki

Now slot the docs from Step 2 into the structure from Step 3. Three sub-tasks: bulk-convert existing files, fill in the templates, and run the roster script if you have a CSV.

4a. Bulk-convert existing files to markdown

# pandoc handles .docx, .odt, .html, and most PDFs
apt install pandoc       # Linux
brew install pandoc       # macOS

# Convert one syllabus document:
pandoc syllabus_2026.docx -o wiki/syllabus/2026-term-1.md

# Bulk-convert a folder of past lesson plans:
for f in ~/lesson_plans/*.docx; do
    name=$(basename "$f" .docx)
    pandoc "$f" -o wiki/lessons/imported/$name.md
done

# For exam PDFs, pdftotext often beats pandoc:
pdftotext -layout midterm_b1.pdf wiki/exams/midterm-b1.md

# For .pptx slide decks (your existing teacher slides):
for f in ~/slides/*.pptx; do
    pandoc "$f" -o wiki/lessons/imported/$(basename "$f" .pptx).md
done

4b. Drop the prepared docs from Step 2 into the right folders

# From Step 2 templates — save each one in place:
wiki/school/profile.md                # Step 2b — school profile
wiki/pedagogy/expectations.md         # Step 2c — KEY doc
wiki/staff/<each-teacher>.md          # Step 2d, one per teacher
wiki/students/<each-student>.md       # Step 2e, one per student
wiki/syllabus/2026-term-1.md          # Step 2f

4c. Bulk-import the student roster from a CSV

If you already have a class spreadsheet with nickname, Thai name, level, age — this script saves an hour of typing:

# bootstrap_students.py — run once after editing students.csv
import csv
from pathlib import Path
from datetime import date

TEMPLATE = Path("wiki/students/_template.md").read_text(encoding="utf-8")

with open("students.csv", encoding="utf-8") as f:
    for row in csv.DictReader(f):
        out = Path(f"wiki/students/{row['nickname'].lower()}.md")
        if out.exists():
            continue            # never overwrite existing notes
        body = TEMPLATE \
            .replace("<Nickname>", row["nickname"]) \
            .replace("<Full name in Thai>", row["thai_name"]) \
            .replace("A2 / B1 / B2 / C1", row["level"]) \
            .replace("e.g. 14 / M.3", row["age_grade"]) \
            .replace("2026-05-11", date.today().isoformat())
        out.write_text(body, encoding="utf-8")
        print(f"created {out}")

Step 5 — Pull the AI models

# Day-side chat model (~4.4 GB)
docker compose exec ollama ollama pull qwen2.5:7b

# Embeddings model for LlamaIndex (~280 MB)
docker compose exec ollama ollama pull nomic-embed-text

# Night-side agent model (~22 GB Q4_K_M) — skip if no GPU yet
docker compose exec ollama ollama pull hermes-4-35b-a3b:q4_k_m

# Verify
docker compose exec ollama ollama list

Step 6 — One-shot NotebookLM bootstrap (then never again)

NotebookLM has exactly one job in this stack and it happens once: at install time, it generates a rich starter set of media — audio overviews, video summaries, slide decks, quizzes, flashcards, and mind maps — from the public reference works in your curriculum. After this step the Docker host disconnects from the internet and never talks to NotebookLM again. You get NotebookLM's best output (especially Audio Overviews) without an ongoing privacy cost.

Artifact	NotebookLM feature	Lands in	Used for
🎧 Audio overviews	Audio Overview (podcast-style)	`wiki/audio/week-NN.mp3`	Listening practice between sessions
🎬 Video summaries	Video Overview	`wiki/video/week-NN.mp4`	Visual review, homework replay
📊 Slide decks	Study Guide → markdown outline	`wiki/slides/week-NN.md`	In-class presentation
📝 Quiz banks	FAQ generation (~20 Q&A per topic)	`wiki/quizzes/week-NN.md`	Practice quizzes, exam prep
🎴 Flashcard sets	Briefing doc → term extraction	`wiki/flashcards/week-NN.json`	Vocabulary drilling (Anki-importable)
🗺 Mind maps	Mind Map → SVG	`wiki/maps/week-NN.svg`	Concept reference for visual learners

# bootstrap_notebooklm.py — runs ONCE during install, never again
# Generates a full starter set of media from PUBLIC reference works only.
from notebooklm import NotebookLM
from pathlib import Path
import json

nb = NotebookLM(cookies_path=Path("~/.config/nblm/cookies.json").expanduser())

WEEKS = [
    (1, "Present simple vs present continuous", ["sources/cambridge_b1_ch1.pdf"]),
    (2, "Past simple",                          ["sources/cambridge_b1_ch2.pdf"]),
    (3, "Present perfect",                     ["sources/cambridge_b1_ch3.pdf"]),
    # … 12 weeks total — one row per week of your syllabus
]

for week, topic, sources in WEEKS:
    # Hard guard: never let student data near NotebookLM
    for s in sources:
        assert "students" not in s and "exams" not in s, \
            f"REFUSED: {s} looks like private school data"

    notebook = nb.create_notebook(name=f"Bootstrap Week {week:02d} — {topic}")
    for s in sources:
        notebook.add_source(Path(s))

    base = Path("wiki")
    print(f"[week {week:02d}] generating 6 artifact types…")

    notebook.audio_overview(focus=f"B1 Thai learners, week {week}: {topic}") \
        .download_to(base / f"audio/week-{week:02d}.mp3")
    notebook.video_overview() \
        .download_to(base / f"video/week-{week:02d}.mp4")
    (base / f"slides/week-{week:02d}.md").write_text(
        notebook.study_guide().markdown, encoding="utf-8")
    (base / f"quizzes/week-{week:02d}.md").write_text(
        notebook.faq(count=20).markdown, encoding="utf-8")
    cards = notebook.briefing_doc(format="vocab-cards").extract_terms()
    (base / f"flashcards/week-{week:02d}.json").write_text(
        json.dumps(cards, ensure_ascii=False, indent=2), encoding="utf-8")
    notebook.mind_map().save_svg(base / f"maps/week-{week:02d}.svg")

print("Bootstrap complete. Now go to Step 8 and disconnect from the internet.")

⛔ The sources list MUST be public

Every file in WEEKS ends up on Google's servers. Allowed: Cambridge handbooks, OUP textbooks, ministry curricula, Council of Europe references, anything you could legally hand to a stranger. Forbidden: the student data and lesson expectations you wrote in Step 2 — those NEVER touch NotebookLM. The assert on line 16 catches the obvious mistakes, but the real check is human judgment before you list the sources.

Step 7 — First conversation + verify citations

After importing and bootstrapping, trigger LlamaIndex to build its vector store, then test:

# Build the LlamaIndex vector store (runs once on first orchestrator start)
curl -X POST http://localhost:42110/api/index/update

# First chat through the orchestrator:
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How does our school approach error correction?"}'

# Expected: an answer citing pedagogy/expectations.md from Step 2c.
# If you get a generic LLM answer with no citation, the LlamaIndex vector store didn't build —
# check `docker compose logs orchestrator` for the index-build progress message.

💡 The "first citation" test

The cleanest sanity check: ask the bot about a specific student by nickname. If it answers with information that only exists in wiki/students/<name>.md, the whole pipeline (orchestrator → LlamaIndex → wiki volume → markdown file) is wired correctly. If it returns "I don't have information about Pim," LlamaIndex didn't index the file — usually a volume mount issue or RAG_ENABLED not set.

Step 8 — Disconnect from the internet, then schedule the night worker

NotebookLM's one-time job is finished. From here, the bot operates fully offline forever. Two things to do:

8a. Sever the NotebookLM credentials and outbound network

# Revoke NotebookLM credentials — there's nothing left to authenticate
rm -i ~/.config/nblm/cookies.json
docker compose exec hermes-night rm -f /data/secrets/nblm_cookies.json

# The runtime tool list in hermes-night/run_night.py already excludes
# any notebooklm tool — confirm with: grep -i notebooklm hermes-night/run_night.py
# (the bootstrap script is in a separate one-shot container and is now retired)

# Cut Docker off from outbound internet — recommended on Linux:
docker network create --driver bridge --internal kru-eng-internal
# Edit docker-compose.yml — under each service, add:
#   networks: [kru-eng-internal]
# Then:
docker compose down && docker compose up -d

# Verify isolation:
docker compose exec orchestrator curl -m 3 https://www.google.com
# expected: curl: (28) Connection timed out — confirmed fully offline

# Optional belt-and-braces — block outbound at the host firewall too:
sudo iptables -I DOCKER-USER -o eth0 -j REJECT   # Linux

🔒 What "fully offline" buys you

After this step, the bot literally cannot leak student data — not by accident, not by a misconfigured tool, not by an agent's bad decision. Every layer (Whisper, LlamaIndex, Ollama, Hermes, STORM, XTTS) runs against local files on local network. The only path data takes is mic → orchestrator → wiki → speaker, all on one machine. PDPA compliance becomes a property of the network topology, not a policy you have to enforce.

8b. Schedule the night worker

Host OS	Scheduler	Setup command
Linux	cron	`crontab -e` → add: `0 2 * * * cd /opt/kru-eng-bot/docker && docker compose run --rm hermes-night`
macOS	launchd or cron	Same crontab line works; or create a `~/Library/LaunchAgents/ai.krueng.night.plist`
Windows	Task Scheduler	Action: `docker.exe` · Args: `compose -f C:\path\docker-compose.yml run --rm hermes-night` · Trigger: daily at 2:00 AM

Run it manually once to confirm before scheduling:

docker compose run --rm hermes-night
# expect ~5-15 minutes on first run (no transcripts yet, just indexing)
# subsequent nights with real session data: 20-60 minutes

8c. Wire LINE bot — owner control from your phone (Thailand-friendly)

LINE is the default messenger for most teachers in Thailand. Wiring the owner-bridge service to a LINE bot turns your phone into a remote control for the night worker — DM commands like "draft Week 5", "show today's drafts", "approve all", or "what did Pim practice yesterday?" — and Hermes responds from inside your firewall. Only you (the owner LINE user ID) can talk to it.

Step	Where	What to do
1. Create LINE channel	developers.line.biz/console	"Create a new provider" → "Messaging API channel" — name it "Kru Eng Owner"
2. Grab credentials	Channel settings	Copy `Channel access token` and `Channel secret`
3. Get your LINE user ID	LINE app	Add the bot as friend, send "myid" — bot DMs your user ID back (use it as `OWNER_LINE_ID`)
4. Expose the webhook	Your router / Cloudflare Tunnel	Make `localhost:8082/webhook/line` reachable from the public internet (Cloudflare Tunnel is free + no port-forward)
5. Set the webhook URL	LINE Developers Console	"Webhook URL" → `https://<your-tunnel>/webhook/line`

# .env additions for the owner-bridge service:
LINE_CHANNEL_ACCESS_TOKEN=<token from step 2>
LINE_CHANNEL_SECRET=<secret from step 2>
OWNER_LINE_ID=<your user ID from step 3>

# Restart the bridge to pick up new env
docker compose restart owner-bridge

# Test from LINE app: DM the bot "status"
# Expected reply within a few seconds: "✅ Kru Eng online. 18 students. Last night job: success at 02:31."

Commands the owner-bridge understands by default (extendable in owner-bridge/commands.py):

DM the bot	What it does
`status`	Health check: how many students, last night-job result, model status
`draft week 5`	Trigger Hermes to draft Week 5 lesson packet on demand
`drafts`	List all `status: DRAFT` pages awaiting review, with summaries
`approve lessons/week-04/plan.md`	Flip a DRAFT page to `status: live`
`student Pim`	Show Pim's latest session log + level + weak areas
`quiz week 3`	Generate a fresh quiz on Week 3's topic using STORM
`summary`	Replay the latest morning report
`help`	List all available commands

⚠️ The "outbound-blocked" rule still applies

You set up --internal network isolation in Step 8a. LINE webhooks come inbound through the Cloudflare Tunnel — that's allowed. But the owner-bridge service itself should NOT make outbound calls to LINE's API; instead it uses the LINE reply-token mechanism (replies are scoped to the inbound webhook and don't need outbound). Confirm by reviewing owner-bridge/main.py — it should only call httpx against LINE in response to a webhook, never on its own.

8d. Or: wire Slack bot — for international schools / multi-teacher setups

If your school runs on Slack instead of LINE, the owner-bridge supports both. Same architecture, different webhook path.

Step	Where	What to do
1. Create Slack app	api.slack.com/apps	"Create New App" → "From scratch" — name it "Kru Eng"
2. Enable Events API	App settings	Subscribe to `app_mention` and `message.im` events
3. Get signing secret + bot token	"Basic Information" + "OAuth"	Install to workspace, copy `Bot User OAuth Token` (xoxb-…) and `Signing Secret`
4. Expose webhook	Cloudflare Tunnel (same as LINE)	Map `https://<tunnel>/webhook/slack`
5. Configure Event URL	App settings	"Event Subscriptions" → Request URL: `https://<tunnel>/webhook/slack`

# .env additions:
SLACK_BOT_TOKEN=xoxb-<token>
SLACK_SIGNING_SECRET=<secret>
OWNER_SLACK_USER_ID=<U01234567 — your Slack user ID, find with /shrug profile>

docker compose restart owner-bridge

# Test: @KruEng status
# Expected: same status reply as LINE

💡 You can wire both at once

The owner-bridge service routes by webhook path (/webhook/line vs /webhook/slack) and supports a single owner across both channels. Use LINE for personal phone notifications and Slack when you're working at the school. The same commands work in both. Email + LINE + Slack are all simultaneous output channels for the morning report.

8e. Owner control architecture diagram

LINE and Slack send webhooks inbound through a Cloudflare Tunnel. The bot's responses ride back on that same connection. The Docker host never initiates outbound calls except for the SMTP morning report.

🔒 Why this design preserves the "offline" property

Step 8a cut outbound internet from Docker. LINE and Slack control still works because webhooks are inbound — Cloudflare Tunnel accepts an incoming HTTPS connection from LINE's servers, forwards it to owner-bridge over the tunnel's reverse connection. The reply travels back on the same socket. No outbound DNS, no outbound TCP. The only deliberate exception is the morning SMTP email, which is one allowlisted outbound rule. Everything else stays sealed.

🎯 You're done. What happens next?

Tomorrow's first session — student talks to the bot. LlamaIndex grounds the answer in the school profile, lesson expectations, syllabus, and any imported lesson plans you prepared in Step 2. The orchestrator writes the transcript to transcripts/.
Tomorrow night at 2 AM — Hermes wakes up. Reads the transcript. Appends a session note to that student's page. Re-embeds the vector index. Emails you a summary.
The day after — when that student returns, the bot already knows what they practiced yesterday and still respects every rule in pedagogy/expectations.md. The Wiki has compounded by one session.

From here, every interaction enriches the Wiki, every night Hermes maintains it, and once a week you spend 15 minutes reviewing DRAFTs Hermes proposed. That's the steady state.

🎓 Picking what to build first

Your situation	Start with	Add next
Single teacher, ~20 students, stable syllabus	📚 Wiki + 🦙 LlamaIndex (Getting Started 1–4)	Add Hermes nightly worker when ready
Several classes, want exam grading / lesson generation	📚 Wiki + 🤖 Hermes + STORM	NotebookLM one-shot bootstrap for media
Whole-school deployment, hundreds of docs	🦙 LlamaIndex (or Danswer for multi-source)	Wiki for live curation, Hermes nightly
Need rich starter media (audio/video/slides)	📓 NotebookLM install bootstrap (Step 6)	Then disconnect — never call NotebookLM again
Want the bot to author new lessons	🤖 Hermes + 🌪 STORM + draft tools	Wiki as the staging area for review
PDPA-sensitive student data	🦙 AnythingLLM or 🦙 LlamaIndex (in-orchestrator)	Skip cloud tools entirely

🎯 The shortest path to value

If you only have an afternoon, follow Getting Started Steps 1–4 (pull Docker, prepare docs, bootstrap wiki, import) and skip everything after. You'll have a bot that grounds answers in your school's actual documents within an hour — even without Hermes, STORM, or the NotebookLM bootstrap. Add the nightly worker (Steps 5 + 8) the next week, and the NotebookLM media pack (Step 6) once you're sure about your public source list. The Wiki + LlamaIndex alone is already most of the value.

📌 Build status — May 2026

🌱 About this project

ภาษาไทย

English

中文

Overview & context — why this exists and who it's for

ภาษาไทย

English

中文

🥡 The tech stack — Docker makes it a recipe, not a black box

ภาษาไทย

English

中文

Layer-by-layer breakdown

Without Docker vs With Docker — what changes for you, the installer

🎯 For students: this is what "infrastructure as code" means

📦 Building it incrementally — start small, add tools as you need them

ภาษาไทย

English

中文

The stage-by-stage build-out

💡 The order is not random — each stage builds on the prior

🎯 The teacher's path through these stages

The division of labour

🎯 Three design rules that shape every decision below

🌗 The goal — a day-and-night worker bot

ภาษาไทย

English

中文

💡 Why the day/night split works

🎯 The one thing they all share — the context window

ภาษาไทย

English

中文

🦙 1. LlamaIndex — in-orchestrator RAG over your Wiki

What a LlamaIndex grounding call looks like

💡 The role LlamaIndex plays in the bot

⚡ Speed vs. accuracy — pick your knob

📐 What happened to "RAG isn't needed"?

🌪 2. STORM — Wikipedia-style nightly content generation

What a STORM run looks like

💡 The role STORM plays in the bot

📓 3. NotebookLM — used exactly once at install, then disconnected forever

⛔ Hard rules — even for the one-time bootstrap

What gets generated, by week, by NotebookLM feature

💡 Why front-loading all this works

📚 4. Wiki — Karpathy's compounding knowledge base

What INDEX.md looks like

🤖 5. Hermes Agent — tool-calling orchestrator

📊 Comparison — how they handle context

🔑 The key insight

🛠 Implementation — pruning, docker, scheduler

The pruning loop — how the Wiki stays healthy

🔒 Safety rails baked into the loop

Merged docker-compose.yml — full stack with the night worker

🛠 Two operational details worth noting

The night worker entry point

✏️ Generating course content — the same stack, run as an author

ภาษาไทย

English

中文

Concrete example — "Week 4: Past Perfect for B1 Thai learners"

What the bot can generate today, with no extra training

🎯 The big unlock

⚠️ Keep a human in the loop

🔄 How the system learns — six loops, six drivers

ภาษาไทย

English

中文

The six loops in one table

🔑 Who actually drives the learning

⚠️ Where the bot can NOT learn (yet)

Diagnosing "the bot got something wrong"

🚀 Getting started — pull Docker, prepare your school docs, then bring the bot online

ภาษาไทย

English

中文

✅ Preflight — hardware check (60 seconds)

Step 1 — Pull Docker and bring up the bot stack

1a. Install Docker (if you don't have it)