Show / hide languages:

🏫 Build a Local AI Tutor

A full-stack build guide — Docker, Wiki, and a 5-tool context architecture (LlamaIndex · STORM · NotebookLM · Wiki · Hermes) running locally on Ollama

คู่มือสร้าง AI ติวเตอร์ในเครื่องของคุณเอง — Docker, Wiki และสถาปัตยกรรมห้าเครื่องมือ

本地 AI 家教完整搭建指南 — Docker、Wiki 和五工具架构

Ollama Docker Qwen 2.5 · 3B LlamaIndex STORM Karpathy LLM Wiki

📌 Build status — May 2026

This guide describes a 5-tool reference architecture. What was actually shipped for krueng.ai diverged in a few places — captured here so the rest of the page can stay focused on the concepts:

RoleOriginal planWhat shipped
Day-side groundingKhoj as a separate Docker service with Postgres + pgvector backend, admin UI to register chat models and agentsLlamaIndex in-process inside the orchestrator. Embeddings via Ollama's nomic-embed-text. Persistent VectorStoreIndex on a named volume. Opt-in via RAG_ENABLED=true.
Chat LLMHermes 4 / Qwen 2.5 7BQwen 2.5 3B by default (2s ungrounded, ~20s grounded on CPU). Hermes 3 8B retained for the night worker.
Hermes Agent harnessAn off-the-shelf hermes_agent pip packageCustom Ollama tool-calling loop, ~140 lines of Python. Shipped twice (owner-bridge + hermes-night) so each container is self-contained.
Owner remote controlLINE + Slack + Hermes Web UILINE + Slack webhooks only. Inbound-only with HMAC signature checks and owner-ID allowlist. No outbound polling = compatible with no-internet hosts. UI deferred.
Wiki contentUser-supplied lessons + STORM-drafted articles25-file seed corpus shipped in docker/seed_wiki/ — 12 weekly lessons (A1→B1), pronunciation, vocabulary, persona, syllabus, methodology, sources. Drafted at Opus 4.7 quality; hermes-night refines on its night pass.
Compose service count7+ services4 runtime services (orchestrator, owner-bridge, tts, whisper) + host Ollama. hermes-night exists but is profile-gated so up -d doesn't bring it up.

All source is at github.com/ddtraveller/watdonchan on the feat/local-bot-stack branch. The wiki corpus is at docker/seed_wiki/. Each lesson follows lessons/LESSON_TEMPLATE.md — a modern TBLT + lexical-chunks + scaffolded-production shape that supersedes pure PPP while staying DSS-OTP compatible.

🌱 About this project

ภาษาไทย

Kru Eng Local Bot คือแชทบอทสอนภาษาอังกฤษแบบเสียงคุยเสียงตอบ ที่ทำงานบนเครื่องของครูทั้งหมด — เสียงและข้อความของนักเรียนไม่ออกจากเครื่อง ตรงตามข้อกำหนด PDPA

วันนี้บอทยังเป็นแค่วงจรเสียง (ไมค์ → Whisper → Qwen 2.5 → XTTS → ลำโพง) ไม่มีฐานความรู้ที่จำได้ระหว่างคาบ

หน้านี้แสดงวิธีต่อยอด ให้บอทเรียนรู้จากทุกคาบ รับเนื้อหาใหม่ และร่างแผนการสอนให้ครูตรวจ — ด้วยเครื่องมือที่ทำงานในเครื่องทั้งหมด

English

Kru Eng Local Bot is a voice-in / voice-out English tutor that runs entirely on the teacher's own hardware. No student audio or transcript ever leaves the machine — PDPA-compliant by design.

Today it's just a voice loop: mic → Whisper → Qwen 2.5 → XTTS/Piper → speaker. It has no persistent memory between sessions.

This guide extends it with a five-tool context architecture (LlamaIndex · STORM · NotebookLM · Wiki · Hermes) so the bot can learn from every session, ingest new course materials, and draft its own lesson packets for the teacher to review — using off-the-shelf tools that all run locally on the same Ollama backend.

中文

Kru Eng 本地机器人是一个语音输入 / 语音输出的英语家教,完全运行在老师自己的硬件上。学生的音频和文字记录从不离开这台机器——天然符合 PDPA 数据保护要求。

今天它只是一个语音回路:麦克风 → Whisper → Qwen 2.5 → XTTS/Piper → 扬声器。课与课之间没有持久记忆。

本指南用一个五工具的上下文架构(LlamaIndex · STORM · NotebookLM · Wiki · Hermes)来扩展它,让机器人能从每一节课中学习、接收新的课程材料、并为老师起草可审阅的教学方案——所有工具都在同一个 Ollama 后端本地运行。

Overview & context — why this exists and who it's for

ภาษาไทย

เพื่อใคร: ครูสอนภาษาอังกฤษ (TEFL), โรงเรียนสอนภาษาขนาดเล็ก, และศูนย์การเรียนรู้ในประเทศไทยและเอเชียตะวันออกเฉียงใต้ ที่ต้องการใช้ AI โดยไม่ต้องส่งข้อมูลนักเรียนไปยังคลาวด์

ปัญหาที่แก้: ผู้ช่วยสอนบนคลาวด์ (ChatGPT, Khan Academy ฯลฯ) ใช้งานได้ดีแต่มีกำแพง 3 อย่าง — อัปโหลด transcript นักเรียนไปให้บุคคลที่สาม, มีค่าใช้จ่ายแบบสมัครสมาชิกต่อนักเรียนซึ่งสะสมตลอดปี, และทำให้การสอนของคุณต้องวิ่งตาม roadmap ของผู้ให้บริการ ไม่ใช่ของคุณเอง

โอกาส: โมเดล open-source (Qwen 2.5, Hermes 4) ตอนนี้ดีพอจริง ๆ สำหรับการสอนภาษาอังกฤษ, โมเดล embedding (nomic-embed) รันบน CPU ใดก็ได้, และ markdown wiki ธรรมดามีความทนทานมากกว่าฐานความรู้ proprietary ใด ๆ การ self-host กลายเป็นเรื่องจริงจังได้แล้วสำหรับครูคนเดียวที่มีโน้ตบุ๊กอายุ 5 ปี

ความสำเร็จคืออะไร: บอทที่เรียนรู้สไตล์การสอน ของคุณ เป็นสัปดาห์, ค่าใช้จ่ายต่อคำถามเป็น 0 บาทหลังติดตั้ง, ไม่รั่วข้อมูล, ร่างบทเรียนใหม่ในรูปแบบ DSS-PPP ของคุณ, และควบคุมได้จาก LINE/Slack บนมือถือของคุณ

การแลกเปลี่ยนที่ตรงไปตรงมา: คุณใช้เวลาประมาณ 3 ชั่วโมงในการติดตั้ง (เทียบกับ 0 สำหรับ SaaS), จัดการ backup เอง, เมื่อมีอะไรพัง คุณซ่อมเองแทนที่จะเปิดตั๋ว support แลกกับการเป็นเจ้าของข้อมูลและพฤติกรรมของบอทถาวร — ไม่มี vendor ไหนเปลี่ยนกติกาให้คุณได้

ใครไม่เหมาะ: ครูที่ไม่มีเวลาดูแลระบบเลย, หรือคนที่ต้องการสร้างเนื้อหาความเร็วสูงแบบสตาร์ทอัพ ที่ความเสถียรของ uptime สำคัญกว่าความเป็นเจ้าของข้อมูล

English

Who this is for: TEFL teachers, small language schools, and learning centers in Thailand and Southeast Asia who want AI assistance without sending student data to the cloud.

The problem this solves: Cloud-based tutoring assistants (ChatGPT, Khan Academy, etc.) work well but raise three barriers — they upload student transcripts to a third party, they cost a subscription per student that compounds over a year, and they shape your teaching to the vendor's roadmap rather than yours.

The opportunity: Open-source models (Qwen 2.5, Hermes 4) are now genuinely good enough for English tutoring, embedding models (nomic-embed) run on any CPU, and a plain markdown wiki is more durable than any proprietary knowledge base. Self-hosting has become realistic for a single teacher with a 5-year-old laptop.

What success looks like: A bot that learns your teaching style over weeks, costs $0 per query after install, never leaks data, drafts new lessons in your DSS-PPP format, and is controllable from your LINE/Slack on your phone.

Honest tradeoffs: You spend ~3 hours on install (vs zero for cloud SaaS). You manage your own backups. When something breaks, you fix it instead of opening a support ticket. In exchange you get permanent ownership of the data and the bot's behavior — no vendor can change the rules on you.

Who shouldn't bother: Teachers with no time to maintain a system, or anyone running a high-velocity content operation where uptime guarantees matter more than data ownership.

中文

给谁用:泰国及东南亚的英语教师(TEFL)、小型语言学校、学习中心——想用 AI 协助教学,但不希望把学生数据传到云端的人。

要解决的问题:基于云端的辅导助手(ChatGPT、Khan Academy 等)功能不错,但有三道门槛——把学生的对话记录上传到第三方;按学生订阅收费,一年累积下来不小;并且会让你的教学跟随供应商的路线图走,而不是你自己的。

机会:开源模型(Qwen 2.5、Hermes 4)现在确实已经能胜任英语辅导,嵌入模型(nomic-embed)在任何 CPU 上都能运行,而朴素的 markdown wiki 比任何专有知识库都更持久。对于一位拥有五年旧笔记本的独立教师来说,自托管已经变得现实可行。

成功是什么样:一个用几个星期学会的教学风格的机器人;安装完成后每次提问 0 美元;数据从不外泄;按你的 DSS-PPP 格式起草新课程;并能通过手机上的 LINE/Slack 控制。

诚实的取舍:你花约 3 小时安装(云端 SaaS 是 0 小时);你自己管理备份;出问题时你自己修,而不是开 support ticket。作为交换,你获得数据和机器人行为的永久所有权——没有供应商能单方面改变规则。

谁不适合:没有时间维护系统的老师;或者运营高频内容、对正常运行时间保证比数据所有权更看重的团队。

🥡 The tech stack — Docker makes it a recipe, not a black box

ภาษาไทย

Docker เป็นคำที่สำคัญที่สุดบนหน้านี้ รองจาก Wiki หากไม่มี Docker การติดตั้งบอทตัวนี้คงต้องสู้กับเวอร์ชัน Python, port ที่ชนกัน, และความแตกต่างของ OS ตลอดสุดสัปดาห์

มี Docker แล้ว ทั้ง stack อยู่ในไฟล์แค่ 2 ชนิด: docker-compose.yml และ Dockerfile ไม่กี่ตัว ทุกคนอ่านได้ ทุกคนแก้ได้ การติดตั้งไม่ใช่การเดินทาง — มันคือ สูตรอาหาร

ลองนึกถึง tech stack เหมือนเค้กชั้น ๆ แต่ละชั้นวางอยู่บนชั้นล่างและทำหน้าที่ของตัวเอง คุณเปิดดูในชั้นไหนก็ได้ ไม่มีอะไรซ่อนอยู่ ความกลัวว่า Docker เป็น "กล่องดำ" เป็นความเข้าใจผิด — Docker image ทุกตัวสร้างจากไฟล์ Dockerfile ที่เป็นข้อความธรรมดา ใครก็อ่านได้

English

Docker is the most important word on this page after Wiki. Without it, installing this bot would mean fighting Python versions, port conflicts, and OS differences across an entire weekend.

With Docker, the whole stack lives in two kinds of files: docker-compose.yml and a few Dockerfiles. Anyone can read them. Anyone can change them. The install stops being a journey and becomes a recipe.

Think of the tech stack like a layer cake. Each layer sits on the one below and does one specific job. You can lift the lid on any layer at any time — nothing is hidden. The "black box" fear about Docker is a misconception: every Docker image is built from a plain text Dockerfile that anyone can read.

中文

Docker 是本页除 Wiki 之外最重要的词。没有 Docker,安装这个机器人就意味着要花一整个周末跟 Python 版本、端口冲突、操作系统差异搏斗。

有了 Docker,整套技术栈都活在两类文件里:docker-compose.yml 和几个 Dockerfile。任何人都能读,任何人都能改。安装不再是一场旅行——它是一份食谱

把技术栈想象成一块千层蛋糕。每一层都坐在下面那一层之上,做自己专门的工作。你随时可以掀开任何一层的盖子——没有什么是隐藏的。把 Docker 视为"黑盒"是个误解——每一个 Docker 镜像都是从一个普通文本文件 Dockerfile 构建出来的,任何人都可以阅读。

The 7-layer tech stack — bottom is hardware, top is humans Each layer is plain text or open source. You can inspect any layer; you can replace any layer. 7 USERS 👨‍🏫 Teacher · 👨‍🎓 Student · 📱 Owner (LINE/Slack) The humans the system serves and answers to 6 INTERFACES 🎙 Voice page :8000 · 🤖 Hermes UI :8081 · 🌉 Owner bridge :8082 How humans reach the bot — browser, mic, phone messaging 5 APP LOGIC Orchestrator · LlamaIndex · STORM · Hermes Agent · Wiki contents (markdown) The five tools that make the bot intelligent — your editable code & markdown 4 CONTAINERS 📦 The runtime service images ollama whisper tts llamaindex orchestrator hermes-ui owner-bridge hermes-night (cron) 3 DOCKER ENGINE 🐳 Docker Engine — reads docker-compose.yml, builds/starts/stops containers The "recipe reader" — turns the YAML file into running software 2 OS 💻 Windows 11 · macOS · Ubuntu Linux Your operating system — Docker is just another program running on it 1 HARDWARE 🔧 CPU · RAM · Disk · GPU (optional) · Network The physical machine — your laptop, desktop, or server Docker's domain layers 3 + 4 You write / edit these your code, your wiki, your interfaces Yours already just need to install Docker
Seven layers, all open and inspectable. Docker manages layers 3 and 4 — the rest is yours.

Layer-by-layer breakdown

#LayerWhat's in itWhat Docker does here
7👤 UsersThe teacher, the students, the ownerNothing — Docker doesn't touch humans
6🖥 InterfacesWeb pages on :8000 / :8081, voice mic, LINE/Slack DMsExposes container ports to your host so a browser can reach them
5🧠 Application logicOrchestrator Python code, LlamaIndex retrieval, STORM articles, Hermes prompts, Wiki markdown contentIsolates each app — they can't accidentally break each other
4📦 ContainersThe runtime service images (ollama (host), whisper, tts, orchestrator, owner-bridge) plus the night-job container (hermes-night)This is Docker — these are the units Docker creates and runs
3🐳 Docker EngineDocker Desktop (Windows/Mac) or Docker Engine (Linux) plus the Compose pluginThe runtime — reads your docker-compose.yml recipe and turns it into running software
2💻 Operating systemWindows 11 / macOS / UbuntuHosts Docker; Docker is just one program among many on your OS
1🔧 HardwareYour CPU, RAM, disk, optional NVIDIA GPU, network cardProvides the compute power; Docker passes GPU access through via the NVIDIA Container Toolkit

Without Docker vs With Docker — what changes for you, the installer

Install taskWithout Docker 😩With Docker 🎉
Install the LLM runtime Download Ollama installer, run it, configure service, hope auto-start works docker compose up ollama
Install Whisper STT Install Python 3.10 specifically, install faster-whisper, install ffmpeg, configure CUDA, pray Pre-baked image, all dependencies inside
Install TTS (XTTS + Piper) Install Coqui, fight torch 2.0 vs 2.1, install Piper, download voice models Pre-baked image, all dependencies inside
Add LlamaIndex Pip-install LlamaIndex, point at the wiki folder, pull nomic-embed-text via Ollama, set RAG_ENABLED=true Pre-baked image, mounts your wiki folder
Get all services talking Configure ports, configure CORS, set environment variables, write start scripts One docker-compose.yml defines the whole network
Reproduce on a new machine Start over from step 1 (~3 hours if you're lucky) Copy one folder, docker compose up (~15 minutes)
Update all components Track each version, test compatibility, manually upgrade each, hope nothing breaks docker compose pull && docker compose up -d
Roll back if broken Re-install previous versions manually, in the right order Pin to previous image tag in compose file, restart
Share with a colleague Write a 20-page install guide; hope they don't hit a snag Send them the folder; they run docker compose up

🎯 For students: this is what "infrastructure as code" means

Your install is no longer a process — it's a file you can read. The bot's deployment is a piece of source code, just like the application logic itself. You can git diff it. You can git revert it. You can code review the install of your own software.

This is one of the most important shifts in modern software engineering, and it applies to AI systems exactly the same way. The model files are huge binaries, but the wiring that makes them useful — the network, the volumes, the environment variables, the startup order — is plain text, version-controlled, and reproducible. That is what Docker buys you.

📦 Building it incrementally — start small, add tools as you need them

ภาษาไทย

คุณไม่ต้องติดตั้งทุกอย่างในวันแรก ระบบนี้ออกแบบให้ เติบโตทีละขั้น

เริ่มจาก stage 0 ใช้ได้จริงแล้ว — เป็นแค่บอทสนทนาเสียง ไม่มีฐานความรู้ ทุก stage ที่เพิ่มทำให้บอทฉลาดขึ้น แต่บอท stage ก่อนหน้าก็ทำงานได้สบาย

คำแนะนำ: ทำ stage 0–2 ให้เสร็จในสุดสัปดาห์แรก ใช้บอท 1 สัปดาห์ แล้วค่อยตัดสินใจว่าจะเพิ่ม stage 3+ หรือไม่

English

You don't have to install everything on day one. This system is designed to grow stage by stage.

Stage 0 is already useful by itself — it's the bare voice chatbot with no knowledge base. Every stage you add makes the bot smarter, but the earlier stages keep working unchanged.

Recommended path: get stages 0–2 running on the first weekend, use the bot for a week, then decide whether stages 3+ are worth the time. You can stop at any stage and still have a working system.

中文

你不需要第一天就安装所有东西。这套系统设计成可以分阶段逐步成长

第 0 阶段本身就已经能用——是一个只有语音对话、没有知识库的基础机器人。你每加一个阶段,机器人就更聪明一些,而之前的阶段照常工作。

建议路线:第一个周末完成 0–2 阶段,用一星期,然后再决定第 3 阶段及以后值不值得投入时间。你可以在任何阶段停下来,系统照样可用。

Eight stages of build-up — each one delivers value on its own Climb as far as you want; stop wherever you're happy. 0. Voice loop mic → STT → LLM → TTS 1. + Wiki folder markdown · git-versioned 2. + LlamaIndex grounded Q&A on wiki 3. + Hermes UI owner console :8081 4. + hermes-night nightly maintenance 5. + STORM nightly lesson authoring 6. + NBLM bootstrap one-shot media gen 7. + Owner bridge LINE/Slack remote control 8. + Network isolation fully offline / sealed already useful: voice chat works first big win: bot remembers students night worker: bot improves while you sleep production-grade
A staircase, not a cliff. Each stage adds one service or one capability. Stop whenever the bot is good enough for your needs.

The stage-by-stage build-out

StageWhat you addNew capabilityTimeYou'd skip this if…
0 — Voice loop ollama · whisper · tts · orchestrator Talk to a generic AI tutor by voice. No memory between sessions. ~30 min Never — this is the foundation
1 — Add the Wiki folder Mount wiki/ in orchestrator; write INDEX.md + your school docs Bot can reference your syllabus and student notes manually in prompts ~30 min You only want a generic tutor, not your school's tutor
2 — Add LlamaIndex In-process inside the orchestrator. Reads the wiki volume, embeds via Ollama, persists the index to /data/wiki_index/ Bot grounds every answer in your wiki pages, with citations. This is the biggest single jump in usefulness. ~15 min Your wiki is tiny (under 10 pages)
3 — Add Hermes UI One container (hermes-ui:8081) + a few starter tools Owner console — you can chat with the agent, run tools manually, inspect what's happening ~30 min You only need student-facing voice; you're fine editing files in a terminal
4 — Add hermes-night hermes-night profile + a cron job at 2 AM Nightly maintenance: session notes get appended to student pages, vector index rebuilds, wiki pruning, morning email ~1 hour You have under 5 students and prefer to keep notes by hand
5 — Add STORM knowledge-storm Python package inside hermes-night Nightly authoring: ask Hermes to "build Week 5", get a full DSS-PPP lesson packet by morning ~30 min You write all your own lessons and don't want AI-generated drafts
6 — NotebookLM bootstrap One-shot script run from your laptop (not the bot host) Generates audio overviews, video summaries, slide decks, quizzes, flashcards, mind maps from public sources — once, then disconnected ~30 min You don't have a Google account or don't want to use any cloud at all
7 — Add owner-bridge One container (owner-bridge:8082) + Cloudflare Tunnel + LINE/Slack creds Control the bot from your phone via LINE or Slack DMs ~30 min You're always near the laptop and don't need remote control
8 — Network isolation Switch Docker network to --internal, block outbound at firewall The host cannot reach the internet. PDPA compliance becomes a property of topology, not policy ~15 min You need outbound access for some other tool (e.g. an external email API)

💡 The order is not random — each stage builds on the prior

You can't add LlamaIndex before the Wiki folder (it needs files to embed). You can't add STORM before Hermes-night (Hermes is what calls STORM). You can't do network isolation before everything else is running (it would break the install). Follow the order; skip later stages if you don't need them.

🎯 The teacher's path through these stages

Most single teachers stop at Stage 4 and are happy. The voice bot grounds answers in their wiki (stages 0–2), they have a console to inspect it (stage 3), and the nightly worker keeps the wiki tidy (stage 4). That covers 90% of the value with about 3 hours of total install time.

Stages 5–8 are for teachers who want the bot to generate new content (5), want the rich starter media (6), want phone-based remote control (7), or care about sealing the network for compliance audits (8). They're all worth doing eventually — but not on day one.

The division of labour

WhenToolRole
☀ Day-time (live tutoring)🦙 LlamaIndex (or AnythingLLM)Local grounded Q&A — reads the same markdown Wiki the teacher edits, cites the page it's quoting
🌙 Nightly (autonomous authoring)🌪 STORMGenerates Wikipedia-style draft lesson articles from sources, with simulated expert perspectives
🌙 Nightly (orchestration)🤖 Hermes 4 agentPlans tool calls, prunes the wiki, refreshes indexes, emails the morning report
🎛 Owner control (any time)🤖 Hermes Web UI + 🌉 owner-bridgeTeacher's console at :8081; LINE/Slack DM commands route through owner-bridge to the same Hermes agent
📚 Substrate (always)Karpathy LLM WikiPlain markdown — the human-readable knowledge base both sides edit; git-versioned
📦 Install-time onlyNotebookLMUsed once during install to generate audio/video/slide/quiz/flashcard starter pack from public sources; disconnected forever after

🎯 Three design rules that shape every decision below

  1. Student data never leaves the machine. LlamaIndex, STORM, Hermes, Wiki — all local on Ollama. NotebookLM runs once at install on public reference works only, then the Docker host disconnects from the internet permanently.
  2. The bot drafts; the teacher approves. Nothing the bot writes goes live to students without a teacher diff-review. The Wiki uses git so every approval is a recorded signal.
  3. The Wiki is plain markdown. No proprietary format, no vector-DB-only knowledge. If any tool in this stack disappears tomorrow, your knowledge base survives unchanged.

🌗 The goal — a day-and-night worker bot

ภาษาไทย

กลางวัน: นักเรียนคุยกับบอท บอทตอบโดยใช้ LlamaIndex ซึ่งอ่านไฟล์มาร์กดาวน์ของ Wiki โดยตรง และอ้างอิงหน้าที่ใช้ตอบ

กลางคืน: Hermes ตื่นมาทำงานบ้าน — อ่าน transcript ของวันนี้, สั่ง STORM ให้เขียนบทเรียนใหม่จากแหล่งข้อมูล, อัปเดต Wiki, ตัดหน้าเก่าทิ้ง, รีบิวด์ดัชนีเวกเตอร์, ส่งอีเมลสรุปให้ครูตอนเช้า

English

Day: students talk to the bot. The orchestrator runs its in-process LlamaIndex retriever over the markdown Wiki and returns a grounded answer with citations back to the pages it used.

Night: Hermes wakes up and does the housework — reads today's transcripts, runs STORM to draft Wikipedia-style articles for any new lesson topic, edits the Wiki (append session notes, prune stale pages, surface contradictions), rebuilds the vector index, and emails the teacher a morning summary.

中文

白天:学生与机器人对话。编排器询问 LlamaIndex,LlamaIndex 直接读取 markdown Wiki 并返回带引用的有据答复,标注它引用了哪些页面。

夜间:Hermes 醒来做家务——读取今日的对话记录,调用 STORM 为新课程主题起草维基百科风格的文章,编辑 Wiki(追加课堂笔记、修剪过时页面、标出矛盾内容),重建向量索引,并在清晨发送总结邮件给老师。

The complete architecture — Kru Eng bot, day and night ☀ DAY mode student-facing · low latency 🌙 NIGHT mode autonomous · maintenance 🎙 Student speaks (mic in browser) Whisper STT Qwen 2.5 (Ollama) + system prompt + LlamaIndex passages + chat history + student page 🦙 LlamaIndex (in-orchestrator) file watcher embedder (nomic) retrieves + cites reads wiki/ directly no separate import XTTS / Piper TTS 🔊 Student hears reply ⏰ cron · 2:00 AM trigger maintenance run 🤖 Hermes 4 Agent "Run nightly maintenance" picks tools dynamically 📊 read transcripts today's sessions 🌪 STORM draft lesson articles ✍️ update Wiki append · prune · git 🦙 rebuild index re-embed changed pages 🔌 (no internet) install-only NotebookLM done 📨 morning report email to teacher 📚 wiki/ (shared markdown volume) teacher edits · LlamaIndex re-embeds · Hermes writes git-versioned · every change reviewable the single source of truth LlamaIndex re-embeds Hermes writes Day-side stays fast and local. Night-side does the heavy authoring and curation. The Wiki is where they meet.
This is the goal: real-time student-facing chat by day, autonomous authoring and curation by night, all five tools playing distinct roles around one shared markdown Wiki.

💡 Why the day/night split works

The day-side bot stays fast and cheap — Qwen 7B reading a RAG-grounded answer is responsive even on CPU. The expensive cognitive work (Hermes 35B reasoning over transcripts, STORM drafting full lesson articles, LlamaIndex re-embedding pages) happens at night when latency doesn't matter and the GPU is idle. By morning, the day-side bot is smarter than it was yesterday, with no perceived performance cost.

The rest of this page unpacks each component, then shows the implementation details — pruning loop, docker-compose, six learning loops — that make the picture above work.

🎯 The one thing they all share — the context window

ภาษาไทย

เครื่องมือทั้งห้า — LlamaIndex, STORM, NotebookLM, Wiki, Hermes — ต่างก็ทำสิ่งเดียวกัน นั่นคือ เลือกข้อความที่ถูกต้องมาใส่ในหน้าต่างบริบทของ LLM ในเวลาที่ถูกต้อง

หน้าต่างบริบท (context window) ของโมเดลมีขีดจำกัด เช่น Qwen 2.5:7b รับได้ ~32K tokens ทุกอย่างที่อยู่นอกหน้าต่างนี้ โมเดล "ลืม"

ความแตกต่างคือ: ใครเลือก? เลือกอย่างไร? ใครเขียนกลับ?

English

All five tools — LlamaIndex, STORM, NotebookLM, Wiki, Hermes — do one thing: put the right text into the LLM's context window at the right time.

The context window is finite (Qwen 2.5:7b ~32K tokens, Gemma 4 ~128K). Anything outside the window, the model has "forgotten."

What differs: who chooses what goes in? how is it chosen? who writes back?

中文

这五个工具——LlamaIndex、STORM、NotebookLM、Wiki、Hermes——做的是同一件事:在正确的时机把正确的文本放进 LLM 的上下文窗口

上下文窗口是有限的(Qwen 2.5:7b 约 32K tokens,Gemma 4 约 128K)。任何在窗口之外的内容,模型都已"忘记"。

区别在于:谁来选择放入什么?怎么选?谁负责写回?

LLM Context Window — the chokepoint everything optimizes ~32,000 tokens (Qwen 2.5:7b) · ~128,000 tokens (Gemma 4) System Prompt "You are Kru Eng…" fixed · 0.5K tok Retrieved Context lesson notes · syllabus exam keys · wiki pages variable · 2-16K tok Chat History prior turns growing · 1-8K tok User Query "explain PPP" live · 50-500 tok ↑ Every architecture below is a different strategy for filling the purple box ↑ All five answer: who curates? when? how much? does it learn?
The context window is a budget. Each architecture is a different policy for spending it.

🦙 1. LlamaIndex — in-orchestrator RAG over your Wiki

🦙
LlamaIndex (replaces Khoj after evaluation)
local · markdown-native · in-process

Why not Khoj? The original guide put Khoj here. Trying it surfaced three real costs: (1) it needs Postgres+pgvector as a second container, (2) first-time setup requires UI clicks in /server/admin to register chat models and agents, (3) its anonymous-mode flag is a CLI argument not an env var, and it routes anonymous requests to a default user that is not the user you indexed your content under. For a 25-file wiki this is too much machinery. LlamaIndex in-process is ~120 lines and has none of those failure modes.

What it does: LlamaIndex is a Python framework for building RAG over your own documents. We use the slimmest possible subset — VectorStoreIndex for storage, SimpleDirectoryReader for ingest, and OllamaEmbedding as the embeddings provider. No LlamaIndex LLM layer (Ollama is called directly from the orchestrator for chat).

Why it's the day-to-day grounding layer: the orchestrator imports it, builds a vector index over /data/wiki/**/*.md at startup, persists the index to a named volume so subsequent boots load instantly, then on every chat call embeds the user message and pulls the top-K chunks to inject as system context. Citations come back as filename stems. The same Ollama instance that serves chat also serves embeddings (via nomic-embed-text, ~270 MB).

LlamaIndex — in-process RAG over the Wiki /data/wiki/ (markdown) INDEX.md curriculum/syllabus_12week.md lessons/w08_what_is_ai...md staff/kru_eng_persona.md references/teaching_methods.md teacher edits these directly read once 🦙 Orchestrator + LlamaIndex 📂 SimpleDirectoryReader 🧮 OllamaEmbedding (nomic) 💾 VectorStoreIndex (persisted) 🔖 citation by filename stem 🎙 Student query via /chat (SSE stream) Top-K passages → Ollama → reply "Week 8 is about can/can't and AI" /data/wiki_index/ (persisted) embed once, reuse across restarts No second container. No admin UI. ~120 lines of Python inside the orchestrator. First boot embeds the wiki once (~30s) and persists; subsequent boots load instantly.
LlamaIndex lives inside the orchestrator process — no separate service. The vector index persists to a named Docker volume, so restarts don't re-embed.

What a LlamaIndex grounding call looks like

# docker/orchestrator/main.py — actual production code, abbreviated from llama_index.core import ( Settings, SimpleDirectoryReader, StorageContext, VectorStoreIndex, load_index_from_storage, ) from llama_index.embeddings.ollama import OllamaEmbedding def _build_or_load_index(): Settings.embed_model = OllamaEmbedding( model_name="nomic-embed-text", base_url="http://host.docker.internal:11434", ) Settings.llm = None # Ollama is called directly from /chat if INDEX_PATH.exists() and any(INDEX_PATH.iterdir()): ctx = StorageContext.from_defaults(persist_dir=str(INDEX_PATH)) return load_index_from_storage(ctx) # instant on restarts docs = SimpleDirectoryReader( input_dir=str(WIKI_PATH), recursive=True, required_exts=[".md"], ).load_data() idx = VectorStoreIndex.from_documents(docs) idx.storage_context.persist(str(INDEX_PATH)) # write to volume return idx def ground(question: str) -> Grounding | None: idx = _get_index() if idx is None: return None nodes = idx.as_retriever(similarity_top_k=3).retrieve(question) if not nodes: return None return Grounding( text="\n\n---\n\n".join(n.text[:900] for n in nodes), citations=[Path(n.metadata["file_path"]).stem for n in nodes], )

💡 The role LlamaIndex plays in the bot

LlamaIndex is the day-side knowledge layer. Every student turn flows through it: the orchestrator embeds the query, pulls top-K wiki passages, and injects them as a system note so Ollama can quote them. Because retrieval is in-process there's no extra container, no admin UI, no health-check theater. The wiki is mounted read-only into the orchestrator; the persisted vector index lives in a separate read-write volume so it survives restarts.

⚡ Speed vs. accuracy — pick your knob

RAG over a 25-file corpus with Ollama-served embeddings adds about 15–25 seconds per reply on CPU (a single Qwen 2.5 3B generation already takes 2s; the extra ~600 tokens of grounding context multiply token-by-token cost). That's not always worth it. The orchestrator exposes three knobs:

  • RAG_ENABLED=true|false — global on/off. Off by default.
  • RAG_TOP_K=3 — how many passages to retrieve. Lower = faster.
  • OLLAMA_NUM_CTX=2560 — context window. Smaller = faster but truncates grounding.

Three approaches to get the best of both worlds, in order of effort:

  1. Route by intent. A keyword pre-check ("week", "lesson", "syllabus") decides whether to ground. A "hi" returns in 2s ungrounded; a "what is week 8 about" pays the grounding cost.
  2. Pre-summarize the wiki. hermes-night writes one-paragraph summaries per page; index those instead of the full lessons. Cuts retrieved context by ~10× — trades some precision.
  3. Switch to GPU. The CPU baseline assumes no acceleration. With NVIDIA + nvidia-container-toolkit, all three numbers — generation, embedding, retrieval — fall to a few hundred milliseconds.

📐 What happened to "RAG isn't needed"?

The earlier version of this guide argued against building a dedicated RAG layer — on the reasoning that "Khoj already IS RAG; adding LlamaIndex is parallel infrastructure". That was correct logic; the premise turned out to be wrong. Khoj's specific operational tax (Postgres, admin UI for chat-model+agent setup, CLI-only anonymous mode, user-scoped indexing that doesn't surface to anonymous queries) made the trade flip the other way. ~120 lines of LlamaIndex code beats ~3 hours of Khoj first-time setup, with the bonus that the index now ships in the same container as the chat logic — one process to debug.

The original three reasons against custom RAG still bite if you grow:

  1. Retrieval recall isn't the bottleneck at small scale — curation is. RAG can't tell you which lesson is your best lesson; it just retrieves what matches.
  2. Chunks are opaque. LlamaIndex helps here by returning file_path in node metadata — every citation maps to a real markdown file the teacher can open. Pre-summarization (above) makes the chunks more legible.
  3. Wider corpora need real vector stores. If you grow past a few thousand documents, swap LlamaIndex's in-memory SimpleVectorStore for Chroma or Qdrant — the rest of the code is unchanged.

🌪 2. STORM — Wikipedia-style nightly content generation

🌪
STORM (Stanford OVAL Lab)
local · authoring · multi-perspective

What it does: STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective question asking) is an open-source pipeline that auto-writes Wikipedia-style long-form articles from a topic and a set of sources. It runs entirely against your local LLM (Ollama works fine) and produces grounded, structured, citation-rich drafts.

How it works (in four phases):

  1. Perspective generation — Given the topic, STORM brainstorms several distinct expert perspectives (pedagogical theory, classroom practice, learner cognition, etc.).
  2. Simulated expert dialogue — Each perspective is played by an LLM persona that asks questions; another LLM, grounded in the sources, answers them. This produces a rich Q&A corpus that no single prompt could elicit.
  3. Outline drafting — STORM synthesizes the dialogue into a structured outline.
  4. Section writing + polish — Each outline section is drafted from the dialogue + sources, then a final pass polishes the whole article.

Why it's the nightly authoring tool: writing a good lesson article is not a one-shot prompt. STORM's multi-perspective dialogue surfaces angles a single prompt misses — exactly what you want for a Week-N lesson that has to teach grammar, anticipate learner errors, and tie into cultural context. The output is a markdown article that drops straight into wiki/lessons/week-NN/ as a DRAFT.

STORM — multi-perspective article generation pipeline 📌 Topic "Past perfect, B1" 📚 Wiki sources cambridge_b1.md · cefr.md 1. Perspectives • pedagogy expert • Thai L1 specialist • CEFR aligner • teen psychology LLM brainstorm 2. Q&A dialogue expert asks ↔ grounded LLM answers ~40 turns per topic cites sources per claim 3. Outline § Form & structure § When to use it § Thai L1 pitfalls § Practice activities synthesized from Q&A 4. Section drafting → final polish pass each § drafted from dialogue + sources, then whole article smoothed 📄 wiki/lessons/week-04/article.md status: DRAFT · ~2000 words · cited · awaits review
STORM's multi-perspective dialogue is what makes its drafts feel researched, not regurgitated. The output goes straight into the Wiki as DRAFT.

What a STORM run looks like

# hermes-night/tools/storm_article.py — invoked by Hermes during nightly authoring from knowledge_storm import STORMWikiRunner, STORMWikiRunnerArguments from knowledge_storm.lm import OllamaClient from pathlib import Path def storm_article(topic: str, output_dir: Path) -> Path: """Generate a Wikipedia-style article on `topic`. Returns the article path.""" lm = OllamaClient(model="hermes-4-35b-a3b:q4_k_m", url="http://ollama:11434") args = STORMWikiRunnerArguments( output_dir=str(output_dir), max_conv_turn=5, # Q&A turns per perspective max_perspective=4, # number of expert voices max_thread_num=3, ) runner = STORMWikiRunner(args, lm=lm, retriever="wiki-local") runner.run( topic=topic, do_research=True, # run perspective Q&A do_generate_outline=True, do_generate_article=True, do_polish_article=True, ) return output_dir / "storm_gen_article_polished.md"

💡 The role STORM plays in the bot

STORM is the nightly author. When Hermes is asked to build a new lesson, STORM produces the main grounded article — the dense, cited explanation of the topic. Hermes then takes that article and runs the specialized author tools (DSS-PPP plan, drill set, quiz, vocab cards) over it. The single STORM article becomes the gravity center for the whole lesson packet.

📓 3. NotebookLM — used exactly once at install, then disconnected forever

📓
NotebookLM (Google)
cloud · install-only · then offline

The role NotebookLM plays in this stack is unusual: it runs exactly once, during the install step (see Getting Started — Step 6d), to generate a rich starter pack of media from public reference works — audio overviews, video summaries, slide decks, quiz banks, flashcards, and mind maps — covering every week of your syllabus. Those artifacts get saved as static files in the Wiki. After that, the Docker host disconnects from the internet and never talks to NotebookLM again.

Why this pattern: NotebookLM's output quality (Audio Overview especially) is genuinely hard to match locally today, but continuously sending data to Google is at odds with the PDPA design rule. The compromise: pay the cloud trip once, on sources you'd share with anyone anyway, and walk away with a year of starter content. From day two onwards the bot is fully offline.

⛔ Hard rules — even for the one-time bootstrap

  • Public sources only. Published textbooks, ministry curricula, Council of Europe references — anything you could legally hand to a stranger. Never: student rosters, exam keys, transcripts, parent contacts, school memos.
  • After bootstrap, revoke credentials and cut the network. See Step 8a. The architectural goal is that NotebookLM cannot be called again, not that it merely shouldn't be.
  • Hermes does NOT keep NotebookLM as a runtime tool. Unlike LlamaIndex or STORM, there is no notebooklm_* tool in the agent's regular toolbox after install. The bootstrap script lives separately and is not invoked at night.

What gets generated, by week, by NotebookLM feature

ArtifactNotebookLM featureLands inUsed for
🎧 Audio overviews (~10 min)Audio Overviewwiki/audio/week-NN.mp3Listening practice
🎬 Video summaries (~5 min)Video Overviewwiki/video/week-NN.mp4Visual review, homework
📊 Slide-deck outlinesStudy Guidewiki/slides/week-NN.mdIn-class presentation
📝 Quiz banks (~20 Q&A)FAQ generationwiki/quizzes/week-NN.mdPractice + exam prep
🎴 Flashcard setsBriefing doc → term extractionwiki/flashcards/week-NN.jsonVocabulary drilling (Anki)
🗺 Mind mapsMind Map → SVGwiki/maps/week-NN.svgVisual concept reference

💡 Why front-loading all this works

Most of these artifacts (audio, video, slides, mind maps) don't need to be regenerated as the course evolves — the underlying grammar/CEFR descriptors don't change. The textual artifacts (quizzes, flashcards) can be extended locally over time using STORM. So the one-time bootstrap covers the parts that benefit most from NotebookLM's specific strengths, while STORM + LlamaIndex cover everything that needs to keep learning. After install, the system is self-sufficient.

The actual bootstrap script and its safety guards are in Step 6 of the Getting Started section below.

📚 4. Wiki — Karpathy's compounding knowledge base

📚
LLM Wiki (Karpathy pattern)
curated · transparent · self-editing

What it does: a folder of plain markdown files plus an INDEX.md that describes what's where (a schema, like CLAUDE.md for a codebase). When the user asks a question, the LLM reads INDEX.md first, decides which pages are relevant, fetches them, then answers. No embeddings, no vector DB.

The killer feature: the LLM can also write to the wiki. After each tutoring session, the bot appends an observation to the relevant page. The wiki is a compounding artifact — it gets smarter every day the bot is used.

LLM Wiki — schema-guided traversal + self-editing wiki/ 📋 INDEX.md ← read first syllabus/tefl_tech.md lessons/week-03/ppp.md exams/midterm-b1/key.md students/pim.md students/somchai.md 1. read INDEX 2. fetch specific pages LLM (Qwen / Hermes) reads INDEX identifies relevant pages drills in (1-2 hops) synthesizes answer no embeddings needed Answer to user grounded · cites page names 3. write-back: append session notes to students/pim.md Karpathy: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase."
The Wiki is a compounding artifact. Read by the bot, edited by the bot, curated by you.

What INDEX.md looks like

# Kru Eng Wiki — Index ## Reading rules - Always read INDEX.md first, then load only pages you need. - When teaching a lesson, also load students/.md for the student. - After every session, append a dated note to students/.md. ## Pages - syllabus/tefl_tech.md — full term curriculum, 12 weeks - syllabus/dss_ppp_format.md — DSS-PPP lesson plan template - lessons/week-03/ppp.md — Week 3 PPP (present perfect) - postmortem.md — what worked / what didn't (auto-appended) - exams/midterm-b1/ — exam + answer key + rubric - students/.md — per-student notes, level, weak areas - pedagogy/ — Krashen, Byrne, Scrivener references

🤖 5. Hermes Agent — tool-calling orchestrator

🤖
Hermes 4 Agent
active · tool-calling · autonomous

What it does: Nous Research's Hermes 4 is a model fine-tuned heavily on agent traces — multi-step conversations where an LLM correctly invokes tools (functions). The Hermes Agent harness gives the model a tool registry it can call: read_wiki_page, append_wiki, storm_article, rebuild_index, prune_wiki, etc.

Hermes 4 35B A3B is MoE — 35B total params, only 3B active per token. Runs on a single RTX 4090 at Q4_K_M (~22 GB). It "stays in character as an agent" much longer than a base instruct model, which is what you need for a nightly maintenance worker.

Hermes Agent — the autonomous tool-caller Hermes 4 Agent Loop 1. Read goal 2. Plan tool calls 3. Execute · observe · iterate Goal / Schedule "Run nightly maintenance" 📂 read_wiki_page fs read on wiki/ ✍️ append_wiki fs write + git commit 🌪 storm_article STORM · draft lesson article 🦙 rebuild_index re-embed changed wiki pages 📊 analyze_sessions parse today's transcripts 🧹 prune_wiki dedupe · archive stale 📨 report_to_teacher email morning summary The agent chooses tools dynamically. Each tool is a small Python function in the hermes-night container.
Hermes is the autonomous worker. It calls the runtime tools (LlamaIndex, STORM, Wiki) as needed. NotebookLM is NOT a runtime tool — it ran once at install and is gone.

📊 Comparison — how they handle context

Dimension 🦙 LlamaIndex 🌪 STORM 📓 NotebookLM 📚 Wiki 🤖 Hermes
Role in the bot Day-time Q&A Nightly authoring Install-time only (then disconnected) Shared substrate Night-time orchestrator
Where context lives Reads wiki/ markdown directly Generates from sources + Wiki Google cloud (sources + index) Plain markdown on disk None — orchestrates the others
Curation None — auto-indexes Wiki Topic-driven; multi-perspective You upload sources LLM curates, teacher prunes Picks which tool to consult
Citations page-level (file paths) in-text references strict, paragraph-level page-level Inherits from underlying tool
Write-back / self-edit no — read-only writes DRAFT articles no yes — appends notes yes — drives all writes
Scale ceiling Thousands of pages One article per run (~2K words) ~300 sources / notebook ~400K words / ~100 pages Unlimited (delegates)
Runs locally? yes — Docker + Ollama yes — Python + Ollama no — needs Google yes yes (Hermes 4)
PDPA-safe with student data? yes yes no — public sources only yes yes
Best at Fast grounded Q&A on a teacher's corpus Multi-perspective long-form drafts Audio Overviews of published books Curated, evolving knowledge Multi-step night jobs
Worst at Writing new long-form content Quick, interactive answers Anything involving private data Scale beyond ~hundreds of pages Knowing facts on its own

🔑 The key insight

These are not competitors — they are five different layers of the same problem. LlamaIndex retrieves grounded passages in real time. STORM drafts long-form lesson articles overnight. NotebookLM is reserved for the one thing only it does well — Audio Overviews on public references. Wiki is the curated, human-readable layer everything else reads from and writes to. Hermes is the night manager that decides which of the others to call. A real production bot uses all five, each in its proper role.

🛠 Implementation — pruning, docker, scheduler

This section unpacks the moving parts behind the day/night architecture shown at the top of the page: how the night worker prunes the Wiki without losing teacher-authoritative content, the merged docker-compose stack (orchestrator with LlamaIndex + owner-bridge + night worker), and the Hermes agent entry point.

The pruning loop — how the Wiki stays healthy

"Append-only" wikis rot fast. The night worker's pruning pass is what stops the Wiki from drowning in stale notes, duplicate observations, and contradictions accumulated across hundreds of student sessions. It runs as a deterministic loop over every page, with Hermes scoring each one and proposing an action — but every destructive change goes through git and a teacher confirmation gate.

Nightly Wiki pruning loop — scan · score · propose · commit ⏰ 2:30 AM — start 📂 git ls-files wiki/ enumerate every page · skip pinned ↻ for each page (Hermes scores in batches of 20) collect signals · decide action ⏱ age signal last git commit vs today >90d cold = candidate 📈 usage signal access log: day-side reads (90d) 0 reads = candidate 🔁 dedup signal embed page · cosine vs newer pages >0.92 = duplicate ⚠ contradiction Hermes spot-checks vs newer page conflict = surface 📌 frontmatter status: pin syllabus, exam keys → skip page 🤖 Hermes scores · proposes one action writes proposal to wiki/.prune-queue.jsonl ✓ keep no-op · still in use ⇄ merge fold dupe into newer page 📦 archive move to wiki/_archive/ 🗑 delete needs teacher approval git commit · loop
Five signals feed Hermes' scoring; four possible actions; everything reversible via git. Delete needs teacher confirmation.

🔒 Safety rails baked into the loop

Pin via frontmatter — any page with status: pin is skipped entirely (syllabus, exam answer keys, ministry-mandated content). Three-tier reversibility — archive is just a move, merge keeps the source in git history, delete writes to .prune-queue.jsonl and waits for the morning email's "approve" link. Contradictions never auto-resolve — Hermes flags them and leaves both pages intact so you decide which is authoritative.

Merged docker-compose.yml — full stack with the night worker

This is the actual production docker/docker-compose.yml from the krueng.ai repo, as of May 2026. Four runtime services (orchestrator, owner-bridge, tts, whisper) plus a profile-gated night worker. Host Ollama is reached via host.docker.internal through docker-compose.override.yml, so we don't run a second Ollama in a container.

# docker/docker-compose.yml — production stack services: ollama: image: ollama/ollama:latest container_name: krueng-ollama ports: - "11434:11434" volumes: - ollama_data:/root/.ollama restart: unless-stopped # Override file disables this in favour of host-native Ollama. # GPU stanza (uncomment with nvidia-container-toolkit): # deploy: { resources: { reservations: { devices: [{driver: nvidia, count: all, capabilities: [gpu]}] }}} whisper: image: onerahmet/openai-whisper-asr-webservice:latest container_name: krueng-whisper environment: ASR_MODEL: ${WHISPER_MODEL:-small} ASR_ENGINE: faster_whisper ports: - "9000:9000" restart: unless-stopped tts: build: ./tts container_name: krueng-tts ports: - "8001:8001" volumes: - tts_cache:/root/.local/share/tts - ./voices:/voices:ro environment: DEVICE: ${TTS_DEVICE:-cpu} XTTS_SPEAKER: ${XTTS_SPEAKER:-Ana Florence} restart: unless-stopped # ── DAY-SIDE BOT: chat + LlamaIndex RAG in one process ──────────── orchestrator: build: ./orchestrator container_name: krueng-orchestrator ports: - "8000:8000" environment: OLLAMA_URL: http://ollama:11434 # overridden to host.docker.internal in override file WHISPER_URL: http://whisper:9000 TTS_URL: http://tts:8001 MODEL: ${MODEL:-qwen2.5:3b} # fast 3B default for CPU hosts RAG_ENABLED: ${RAG_ENABLED:-false} # opt-in: trades 15-25s/reply for grounding EMBED_MODEL: ${EMBED_MODEL:-nomic-embed-text} WIKI_PATH: /data/wiki INDEX_PATH: /data/wiki_index RAG_TOP_K: ${RAG_TOP_K:-3} OLLAMA_NUM_PREDICT: ${OLLAMA_NUM_PREDICT:-180} OLLAMA_NUM_CTX: ${OLLAMA_NUM_CTX:-1536} # auto-bumps to 2560 if RAG_ENABLED volumes: - wiki:/data/wiki:ro # read-only mount of the markdown corpus - wiki_index:/data/wiki_index # persisted LlamaIndex VectorStoreIndex depends_on: - ollama - whisper restart: unless-stopped # ── OWNER CONTROL: LINE/Slack webhooks → in-process Hermes tool loop ── owner-bridge: build: ./owner-bridge container_name: krueng-owner-bridge ports: - "8082:8082" environment: OLLAMA_URL: http://ollama:11434 HERMES_MODEL: ${HERMES_MODEL:-hermes3:8b} # Hermes 4 not on Ollama yet; hermes3 stands in WIKI_PATH: /data/wiki NIGHT_QUEUE: /data/queue/night_jobs.jsonl LINE_CHANNEL_ACCESS_TOKEN: ${LINE_CHANNEL_ACCESS_TOKEN:-} LINE_CHANNEL_SECRET: ${LINE_CHANNEL_SECRET:-} OWNER_LINE_ID: ${OWNER_LINE_ID:-} SLACK_BOT_TOKEN: ${SLACK_BOT_TOKEN:-} SLACK_SIGNING_SECRET: ${SLACK_SIGNING_SECRET:-} OWNER_SLACK_USER_ID: ${OWNER_SLACK_USER_ID:-} volumes: - wiki:/data/wiki # reads + writes (owner promotes DRAFTs) - night_queue:/data/queue depends_on: - ollama restart: unless-stopped # Webhooks come INBOUND through Cloudflare Tunnel. # No outbound polling: LINE reply-tokens and Slack chat.postMessage on the same socket. # Custom ~140-line Ollama tool-calling loop — no off-the-shelf hermes_agent package. # ── NIGHT WORKER (cron-triggered) ────────────────────────────────── # Run with: docker compose run --rm hermes-night hermes-night: build: ./hermes-night container_name: krueng-hermes-night profiles: ["night"] # excluded from `up -d`; cron triggers it environment: OLLAMA_URL: http://ollama:11434 HERMES_MODEL: ${HERMES_MODEL:-hermes3:8b} WIKI_PATH: /data/wiki NIGHT_QUEUE: /data/queue/night_jobs.jsonl TRANSCRIPT_PATH: /data/transcripts REPORT_EMAIL: ${REPORT_EMAIL:-} SMTP_HOST: ${SMTP_HOST:-} SMTP_PORT: ${SMTP_PORT:-587} SMTP_USER: ${SMTP_USER:-} SMTP_PASS: ${SMTP_PASS:-} volumes: - wiki:/data/wiki # night-side reads + writes - night_queue:/data/queue - transcripts:/data/transcripts depends_on: - ollama volumes: ollama_data: tts_cache: wiki: # shared: orchestrator reads, owner-bridge + hermes-night read+write wiki_index: # LlamaIndex's persisted VectorStoreIndex — survives restarts night_queue: transcripts:

🛠 Two operational details worth noting

  • profiles: ["night"] keeps hermes-night out of docker compose up -d, so it never runs as a service. Host cron launches it with docker compose run --rm hermes-night, which exits when the maintenance script returns.
  • The wiki volume is mounted :ro on the orchestrator — the day-side bot can never accidentally corrupt the knowledge base. Only the night worker has write access, and every change goes through git inside the volume.

The night worker entry point

# hermes-night/run_night.py from hermes_agent import Agent from tools import ( read_wiki_page, append_wiki, prune_wiki, storm_article, rebuild_index, read_transcripts, email_report, ) agent = Agent( model="hermes-4-35b-a3b:q4_k_m", base_url="http://ollama:11434/v1", tools=[ read_wiki_page, append_wiki, prune_wiki, storm_article, rebuild_index, read_transcripts, email_report, ], ) goal = """ It is 2 AM. Run the nightly Kru Eng maintenance: 1. Read today's session transcripts (read_transcripts). 2. For each student mentioned, append a dated note to their wiki page summarizing what they practiced and where they struggled. 3. If any new lesson topic was requested by the teacher, run storm_article() to draft a Wikipedia-style article and save it to wiki/lessons/<week>/ as status: DRAFT. 4. Prune any wiki pages last touched more than 90 days ago if they duplicate newer content (prune_wiki, archive not delete). 5. Call rebuild_index() so the day-side bot sees today's edits. 6. Email kru@krueng.ai a 5-bullet summary of what changed. """ agent.run(goal)

✏️ Generating course content — the same stack, run as an author

ภาษาไทย

สถาปัตยกรรมแบบเดียวกันนี้ ใช้สร้าง เนื้อหาบทเรียน ได้ — ไม่ใช่แค่ตอบคำถาม

กระบวนการ: ครูระบุหัวข้อ → Hermes วางแผน → STORM เขียนบทความอ้างอิงจาก Wiki → Hermes ร่างแผนการสอน DSS-PPP, แบบฝึกหัด, ข้อสอบ → เขียนลง Wiki ให้ครูตรวจ

English

The same stack runs the other direction — instead of answering student questions, it writes new course materials.

Flow: teacher names a topic → Hermes plans → STORM grounds the topic from existing Wiki sources → Hermes drafts a DSS-PPP lesson plan, exercises, exam questions → posts to the Wiki for human review.

中文

同一套技术栈反向运行——不再回答学生的问题,而是撰写新的课程材料

流程:老师指定主题 → Hermes 制定计划 → STORM 基于现有 Wiki 资料对主题进行多视角研究 → Hermes 起草 DSS-PPP 教案、练习、考题 → 提交到 Wiki 等待人工审核。

Course-content generation pipeline 👩‍🏫 Teacher input "Build Week 4 PPP lesson" topic: past perfect, B1 🤖 Hermes plans decomposes into 5 sub-tasks brief · plan · drill · quiz · exam 🌪 ground topic STORM (multi-perspective) → cited brief on past perfect 📚 Wiki source page cached for reuse 📋 DSS-PPP plan • Warmer (5 min) • Presentation (10) • Practice (15) • Production (15) 🎯 Drill set • Gap-fill ×10 • Transformation ×8 • Error correction ×6 • Pair dialogue ×2 📝 Quiz + key 10 MCQ + rubric 3 short-answer 1 writing prompt + student-facing version 🎴 Vocab + audio 12 target words Thai gloss + IPA XTTS pronunciation → flashcard HTML 📚 Posted to wiki/lessons/week-04/ status: DRAFT — awaits teacher review 📧 Morning email to teacher "Week 4 lesson ready — 4 artifacts to review" One topic in → a full lesson packet out, all grounded in cited sources, all reviewable as markdown.
A single Hermes invocation produces a DSS-PPP plan, drill set, quiz, and vocab cards — all from one cited source brief.

Concrete example — "Week 4: Past Perfect for B1 Thai learners"

# hermes-night/run_authoring.py — invoked manually or via cron from hermes_agent import Agent from tools import ( storm_article, wiki_search, paperqa_lookup, draft_dss_ppp, generate_drills, generate_quiz, generate_vocab_cards, synthesize_xtts_audio, write_wiki_page, generate_flashcard_html, ) agent = Agent( model="hermes-4-35b-a3b:q4_k_m", base_url="http://ollama:11434/v1", tools=[ storm_article, wiki_search, paperqa_lookup, draft_dss_ppp, generate_drills, generate_quiz, generate_vocab_cards, synthesize_xtts_audio, write_wiki_page, generate_flashcard_html, ], ) agent.run(""" Build a complete Week 4 lesson packet: topic: past perfect tense level: CEFR B1 audience: Thai high-school learners, ages 14-16 duration: 45 minutes format: DSS-PPP (Warmer · Presentation · Practice · Production) Steps: 1. Ground the topic via storm_article() — runs multi-perspective research and writes a Wikipedia-style article to wiki/lessons/week-04/article.md. 2. Draft the DSS-PPP plan. Each stage must name a concrete activity and a CEFR can-do statement. 3. Generate 26 drill items across gap-fill, transformation, error-correction, and pair dialogue formats. 4. Generate a 10-MCQ + 3-short-answer quiz with rubric. 5. Extract 12 target vocabulary items (English + Thai gloss + IPA), synthesize XTTS audio, and emit a flashcard HTML page using the tefl-bilingual.css pattern from CLAUDE.md. 6. Write everything to wiki/lessons/week-04/ as separate markdown files with status: DRAFT in the frontmatter. 7. Email kru@krueng.ai a summary with links to each artifact. """)

What the bot can generate today, with no extra training

ArtifactSource layerAuthor layerOutput format
📋 DSS-PPP lesson plan STORM article + Wiki sources Hermes drafts to template markdown in wiki/lessons/
🎯 Practice drills Target grammar/vocab from brief Hermes generates · varied formats markdown + answer key
📝 Quiz + answer key + rubric Brief + CEFR descriptors Hermes drafts · self-scores test set markdown + HTML student version
🎴 Vocab flashcards (bilingual) Word list from brief Hermes glosses · XTTS audio HTML using tefl-bilingual.css
🎧 Listening practice audio (week packs) Pre-generated at install via NotebookLM (Step 6) Lives in wiki/audio/week-NN.mp3 mp3 + transcript
🎙 Listening practice audio (ad-hoc) STORM article excerpt + XTTS Hermes calls synthesize_xtts_audio() on demand mp3 saved to wiki/audio/ad-hoc/
🎬 Video script / storyboard Brief + style guide Hermes drafts · HeyGen-ready JSON for generate_heygen.py
📖 Reading text + comprehension Qs Vocab + level + brief Hermes writes leveled passage markdown · 200-400 words
💬 Conversation activity Target grammar + cultural context Hermes drafts roleplay cards markdown · printable cards
🎮 RPG quest (Land of Suvarna) Vocab list from brief Hermes drafts NPC dialogue, items JS objects for rpg-data.js
📊 Whole syllabus week Curriculum doc + prior weeks Chain of all above tools Full wiki/lessons/week-NN/ tree

🎯 The big unlock

You already have generate_tefl_*.py scripts that produce content from one-off prompts. This architecture turns those scripts into tools the agent can compose. Instead of you running five scripts in sequence and stitching the outputs together, you tell the bot "build Week 4" and it picks the tools, runs them in order, grounds each step in cited sources, and posts a coherent lesson packet to the Wiki for your morning review.

The teacher's job shifts from authoring to editing — and the corpus compounds: every lesson you ship enriches the Wiki, which makes the next lesson easier to ground.

⚠️ Keep a human in the loop

Generated content always lands in the Wiki as status: DRAFT. Never auto-publish to students. The bot writes; you read, edit, and promote. The Wiki history (git) means you can see exactly what the bot proposed vs. what you shipped — a great training signal for tuning the prompts over time.

🔄 How the system learns — six loops, six drivers

ภาษาไทย

ระบบนี้ "เรียนรู้" ใน หกวงจร ที่แตกต่างกัน แต่ละวงจรมี คนขับ ของตัวเอง — บางอันขับโดยนักเรียน บางอันโดยครู บางอันโดยตัวบอทเอง

ถ้าไม่เข้าใจว่าวงจรไหนทำงานอย่างไร เมื่อบอททำผิด คุณจะไม่รู้ว่าจะแก้ที่ไหน

English

This system "learns" in six distinct loops. Each loop has its own driver — some are powered by the student, some by the teacher, some by the bot itself, and each runs on a different clock.

If you don't know which loop is which, then when the bot behaves wrong you won't know where to fix it.

中文

本系统通过六个不同的循环来"学习"。每个循环都有自己的驱动者——有些由学生驱动,有些由老师驱动,有些由机器人自己驱动,每个循环的节奏也不同。

如果你分不清是哪个循环,那么当机器人出错时,你也就不知道该从哪里修复。

Six learning loops orbiting the Wiki each loop is independent · each has a specific driver · each updates a different layer 📚 Wiki curated knowledge git-versioned 1. Live conversation driver: 👨‍🎓 student turn clock: every ~10 seconds updates: chat history only in-context learning reads index 2. Session memory driver: 🤖 Hermes (end-of-session) clock: per session (~30 min) updates: students/<name>.md "Pim still shaky on past perfect" append 3. Source ingestion driver: 👩‍🏫 teacher drops PDF clock: when new material arrives updates: wiki/sources/ + index via STORM (local) add page 4. Pruning driver: 🤖 Hermes (nightly cron) clock: 2:30 AM daily updates: keep/merge/archive age + usage + dedup signals prune 5. Course authoring driver: 👩‍🏫 teacher names topic clock: on demand (weekly?) updates: lessons/week-NN/* DRAFT Hermes composes generators write DRAFT 6. Human review driver: 👩‍🏫 teacher edits DRAFT clock: each morning updates: promotes DRAFT → live highest-quality signal promote · git diff ↺ Implicit prompt tuning driver: cumulative effect of all six loops above clock: months — never auto-applied updates: SYSTEM_PROMPT, INDEX.md, tool list teacher decides what patterns become rules
Six independent learning loops, all writing back to the Wiki at different speeds. The Wiki is the integration point.

The six loops in one table

# Loop What drives it How often What signal it learns from What changes
1 Live conversation 👨‍🎓 Student speaks Every turn (seconds) The current dialogue In-session memory only — chat history in the LLM context. Forgotten when session ends unless loop 2 captures it.
2 Session memory 🤖 Hermes summarizes End of each session Whisper transcript of the session Appends a dated note to students/<name>.md. Next session starts with that memory in context.
3 Source ingestion 👩‍🏫 Teacher drops a file When new material arrives (weeks) External PDFs, books, videos in /incoming/ STORM produces a cited article from existing Wiki sources → new page in wiki/lessons/. Future questions can ground in it.
4 Pruning 🤖 Hermes (cron at 2:30 AM) Nightly Age + usage + dedup + contradiction signals Wiki pages keep / merge / archive / delete-pending. Vector index re-embedded for changed pages.
5 Course authoring 👩‍🏫 Teacher names a topic On demand (weekly) The topic + existing Wiki + cited sources Bot writes DRAFT lesson packets to wiki/lessons/week-NN/. Becomes live only after loop 6.
6 Human review 👩‍🏫 Teacher edits DRAFTs Each morning The git diff between bot draft and approved version DRAFT promoted to live. The diff is the strongest training signal in the whole system — it tells the bot exactly what it got wrong.
Implicit prompt tuning 👩‍🏫 Teacher (cumulative) Months Patterns visible across loops 2 + 4 + 6 Teacher updates SYSTEM_PROMPT, INDEX.md rules, or tool definitions. Never auto-applied.

🔑 Who actually drives the learning

Look at the "driver" column. Three loops are teacher-driven (3, 5, 6 + the implicit one). Two are bot-driven (2, 4). One is student-driven (1). The teacher's signal is the smallest in volume but the highest in quality — it's the only loop that decides what becomes truth. The bot's loops are mechanical: they capture and curate, but they never promote DRAFT to live and they never rewrite the system prompt. This split is intentional — it's what keeps the bot useful instead of confidently wrong.

⚠️ Where the bot can NOT learn (yet)

Nothing in this stack updates the model weights. Qwen 2.5 and Hermes 4 are frozen — every loop above is context engineering, not fine-tuning. If the bot keeps getting something wrong even after you fix the Wiki, the fix is either (a) better system prompt, (b) better INDEX.md, (c) a new tool, or (d) eventually a LoRA on top of the base model. Don't confuse loops 1–6 with model training.

Diagnosing "the bot got something wrong"

SymptomWhich loop failedFix
Bot forgot a student's level from last week Loop 2 — session memory wasn't captured Check Hermes ran post-session; verify students/<name>.md was appended
Bot cites a textbook you haven't uploaded Loop 3 didn't fire — bot hallucinated Tighten system prompt: "cite only from Wiki pages you actually read"
Bot still uses an old definition you corrected Loop 4 — stale page wasn't pruned Mark the new page authoritative, the old one status: archive
Generated lesson plan misses your DSS-PPP timings Loop 5 — author tool prompt is wrong Update draft_dss_ppp tool template; review the next DRAFT
Bot repeats the same teaching mistake across many lessons Loop ↺ — pattern not yet promoted to system prompt Encode the rule in SYSTEM_PROMPT or INDEX.md after seeing it 3+ times

🚀 Getting started — pull Docker, prepare your school docs, then bring the bot online

ภาษาไทย

ส่วนนี้พาคุณจาก "เครื่องเปล่า" ไปจนถึง "บอทพร้อมสอนพรุ่งนี้" ภายในประมาณ 1 ชั่วโมง — โดยเริ่มจาก Docker, จากนั้นเตรียมเอกสารของโรงเรียน (ข้อมูลโรงเรียน · ครู · นักเรียน · ความคาดหวังในชั้นเรียน), แล้วค่อยเปิดบอท

หัวใจของการเริ่มต้นไม่ใช่เทคนิค แต่คือ เอกสารที่คุณเตรียม — บอทจะดีหรือไม่ดีขึ้นอยู่กับสิ่งที่คุณป้อนตอนแรก

English

This section walks you from "fresh machine" to "the bot can teach tomorrow" in about an hour. The order is deliberate: pull Docker first, then prepare the school documents (school profile · staff · students · lesson expectations), then bring the bot online and import everything.

The most important step is not technical — it's the documents you prepare in Step 2. The bot will be exactly as useful, considerate, and aligned with your teaching as the docs you feed it.

中文

本节带你从"全新机器"到"明天就可以上课",约需 1 小时。顺序是经过精心安排的:先拉取 Docker,然后准备学校文档(学校简介 · 教职员 · 学生 · 课堂期望),最后才让机器人上线并导入所有内容。

最关键的一步不是技术——而是你在第 2 步准备的文档。机器人会有多好用、多体贴、多贴合你的教学,完全取决于你最初喂给它的资料。

✅ Preflight — hardware check (60 seconds)

Before pulling anything, confirm your machine clears the bar:

  • Minimum: 16 GB RAM, 50 GB free disk, modern CPU (i5 8th gen / Ryzen 5 / Apple M1). Runs Qwen 7B for both day and night.
  • Recommended: 32 GB RAM, 200 GB disk, NVIDIA 12+ GB VRAM or Apple Silicon with 32+ GB unified memory. Adds Hermes 4 35B for the night worker.
  • OS: Windows 10/11, macOS 12+, or Ubuntu 22.04+. Linux is fastest.

No GPU? Skip Hermes 4 — the architecture still works with Qwen 7B doing both roles. Add a GPU later and swap HERMES_MODEL in .env.

From fresh hardware to first conversation — 8 steps, ~60 minutes 1. Pull Docker install + compose up 2. Prep docs school · staff · students 3. Bootstrap wiki INDEX.md + dirs 4. Import docs pandoc · templates 5. Pull models qwen · nomic · hermes 6. NBLM bootstrap one-shot media 7. First chat verify citations 8. Cron sever + schedule 10 min ~30 min 5 min 15 min 15 min 10 min 2 min 5 min ← longest by design — your docs shape everything downstream One-time setup. Step 2 is the keystone — spend the time there.
The full onboarding flow. Step 2 (prepare docs) gets dedicated time because everything downstream — what the bot grounds in, what LlamaIndex retrieves, what STORM generates — depends on it.

Step 1 — Pull Docker and bring up the bot stack

This is two things in one: install the Docker engine if you don't have it, then pull the bot images and start them. By the end of this step docker compose ps shows five services running and you have an idle bot waiting for documents.

1a. Install Docker (if you don't have it)

OSWhat to installGet it from
Windows 11 Docker Desktop + WSL2 backend docs.docker.com/desktop/install/windows-install
macOS (Intel or Apple Silicon) Docker Desktop docs.docker.com/desktop/install/mac-install
Ubuntu/Debian Docker Engine + Compose plugin curl -fsSL https://get.docker.com | sh
NVIDIA GPU (any OS) NVIDIA Container Toolkit (after Docker) NVIDIA docs
# Confirm Docker is healthy docker --version # expect Docker version 24+ or 25+ docker compose version # expect Compose v2.x docker run --rm hello-world # GPU users — confirm CUDA is reachable from containers docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

1b. Pull the bot stack and bring it up

git clone https://github.com/your-org/kru-eng-bot.git cd kru-eng-bot/docker cp .env.example .env # Edit .env minimum settings: # MODEL=qwen2.5:7b (use qwen2.5:3b on 16 GB RAM) # WHISPER_MODEL=small (use base on a CPU-only box) # TTS_DEVICE=cpu (cuda once GPU is wired) # TEACHER_EMAIL=you@school.ac.th # SMTP_HOST=smtp.gmail.com:587 (for the morning report) docker compose up -d --build # first build pulls torch + Coqui + Piper voice (~2 GB) # takes 5-10 minutes — go to Step 2 while it builds docker compose ps # expect: ollama, whisper, tts, orchestrator, owner-bridge # hermes-ui, owner-bridge — all "running"

1c. What just came up — the URLs you'll use

ServiceURLFor whoPurpose
🎙 Orchestratorhttp://localhost:8000StudentsThe voice chat web page
🦙 LlamaIndexhttp://localhost:42110You (debug)Search the wiki directly, see citation paths
🤖 Hermes Agent Web UIhttp://localhost:8081You (owner)Chat with Hermes, run ad-hoc tools, view night-job history
🌉 Owner bridgehttp://localhost:8082/webhookLINE / SlackWebhook receiver — wired in Step 8c/8d
🦙 Ollamahttp://localhost:11434InternalModel server — not for direct human use

💡 The Hermes Web UI is your console

The Hermes Agent Web UI at localhost:8081 is how you, the owner, talk to the same agent that runs every night. From it you can: ask Hermes to draft a Week 5 lesson on demand, inspect what tools it called last night, replay a failed night run, approve pending DRAFT pages. It's your primary control panel — and unlike students who get voice-only, you get the full agent chat with tool visibility.

💡 You can start Step 2 right now

The compose build runs in the background. Don't wait for it — open your notebook and start drafting the docs in Step 2 below. By the time you finish writing, the build will be done.

Step 2 — Prepare your school documentation

This is the most important step in the whole install. Spend 30 minutes here even if you're impatient to see the bot running. The documents you prepare now decide what the bot knows, how it talks to your students, and how it grades work. Skimp here and the bot will produce generic LLM output. Invest here and it becomes your teaching assistant.

2a. The document checklist — what to gather and what to write

DocumentWhat goes in itWill live atHave it?
🏫 School profileSchool name, type, mission, age range, languages taught, term dates, contact infowiki/school/profile.mdWrite fresh (15 min)
📋 Lesson expectations most leverageDSS-PPP timings, classroom rules, error-correction style, homework policy, feedback approach, cultural normswiki/pedagogy/expectations.mdWrite fresh (15 min)
👩‍🏫 Staff rosterPer teacher: name, role, subjects, contact, teaching style, specialtieswiki/staff/<name>.mdWrite fresh (5 min each)
🎓 Student rosterPer student: nickname, Thai name, age/grade, CEFR level, goals, strong/weak areas, parent contactswiki/students/<name>.mdFrom your records
📅 SyllabusTerm-by-term curriculum with CEFR can-do statements, weekly topicswiki/syllabus/<term>.mdFrom your existing plan
📚 Existing lesson plansPast lessons in any format (.docx, .pptx, PDF) — these become starter contentwiki/lessons/imported/Gather files
📝 Past exams + keysExam questions, answer keys, rubrics, postmortemswiki/exams/<id>.mdGather files
🎨 Pedagogical referencesFrameworks you follow (PPP, CEFR, TBL, lexical approach…)wiki/pedagogy/Write or cite

🎯 Write the Lesson Expectations doc first — it shapes everything else

If you only have 10 minutes for this step, spend them on wiki/pedagogy/expectations.md. That document tells the bot how you teach: your timing preferences, your error-correction style, your homework expectations, how strict to be, how feedback is given. Every lesson Hermes ever drafts, every reply LlamaIndex ever supports, will be informed by this file. It's the highest-leverage artifact in the wiki.

2b. Template — School profile

Copy this into wiki/school/profile.md and fill the brackets:

--- status: pin type: school updated: 2026-05-11 --- # <School name> ## Profile - **Type:** <private language school / public primary / international / corporate training> - **Location:** <city, country> - **Age range:** <e.g. 8-18> - **Languages taught:** <English, Mandarin, …> - **Term system:** <2-term / 3-term / semester> - **Typical class size:** <e.g. 8 students> - **Class duration:** <e.g. 45 minutes> ## Mission <1-2 paragraphs — what the school is for, the educational philosophy, what success looks like for a graduating student> ## Term dates (current academic year) - **Term 1:** 2026-04-01 → 2026-06-30 - **Term 2:** 2026-07-15 → 2026-10-30 - **Term 3:** 2026-11-15 → 2027-02-28 - **Holidays:** <list cultural / religious / national holidays the school observes> ## Contact - **Director:** <name, email, phone> - **Administration:** <email> - **Address:** <address>

2c. Template — Lesson Expectations (the keystone doc)

Copy this into wiki/pedagogy/expectations.md:

--- status: pin type: pedagogy updated: 2026-05-11 --- # Lesson Expectations — How <School name> teaches English ## Class structure (DSS-PPP variant) - **Lesson length:** <45 / 60 / 90 minutes> - **Warmer:** <duration · type of activity · energy goal> - **Presentation:** <duration · board-work expectations · modeling style> - **Practice:** <duration · drill format mix · error-correction frequency> - **Production:** <duration · pair vs group preference · freer activity types> - **Cooldown / wrap:** <how lessons end · homework setup · feedback moment> ## Classroom rules 1. <e.g. English-only during Practice and Production stages> 2. <e.g. Phones in a basket at the door> 3. <e.g. Mistakes are welcomed — they teach us> 4. <…> ## Error-correction style - **Pronunciation:** <delayed / immediate / record + replay> - **Grammar:** <delayed feedback after Production, with examples on board> - **Vocabulary:** <model-and-repeat, then move on> - **What you NEVER correct:** <e.g. accent, dialect features, code-switching during freer practice> ## Homework policy - **Frequency:** <one short task per lesson / weekly project> - **Format:** <written · audio recording · Anki cards · journal> - **Submission:** <when / how / late policy> - **How the bot can help students:** <e.g. the bot may explain and quiz, but never write the answer for the student> ## Feedback style - **Praise:** <specific, immediate, what they did well> - **Surfacing weaknesses:** <gently, one concrete next step at a time> - **Reports to parents:** <weekly summary email, monthly progress report, …> - **Tone:** <warm but direct / encouraging but honest> ## Cultural expectations - <Anything specific to your school / region / students> - <e.g. Buddhist holidays as reading topics, respectful address forms, Thai cultural references the bot should know and use> - <e.g. Avoid topics: alcohol, gambling, politics> ## Things the bot MUST NOT do - <e.g. Never assign homework that requires a parent's English> - <e.g. Never compare students to each other> - <e.g. Never grade work that wasn't reviewed by the teacher>

2d. Template — Staff member

--- status: pin type: staff updated: 2026-05-11 --- # <Name> — <Role> - **Email:** <email> - **Phone:** <phone> - **Subjects:** <e.g. English (B1-B2), TOEIC prep> - **Specialties:** <e.g. exam strategy, pronunciation, creative writing> - **Start date:** <date> - **Languages:** <e.g. Thai (native), English (C2), Mandarin (B1)> ## Teaching style <1-2 paragraphs — how this teacher teaches, what works for them, what students of theirs say> ## Notes <Anything else the bot should know — schedule preferences, classes this teacher covers, how to reach them in an emergency>

2e. Template — Student profile

--- status: live type: student updated: 2026-05-11 --- # <Nickname> (<Full name in Thai>) - **Level:** <A2 / B1 / B2 / C1> - **Age / grade:** <e.g. 14 / M.3> - **Started:** <2026-04-01> - **Strong at:** <listening, reading comprehension> - **Weak at:** <present perfect, prepositions of time, /θ/ pronunciation> - **Goals:** <pass IELTS 5.5 by 2026-12 / improve speaking confidence> - **Parent contact:** <name, email, phone — for weekly reports> - **Special considerations:** <dyslexia, hearing accommodation, shy in group settings> ## Background <1-2 sentences — family context, prior English exposure, motivation, hobbies that connect to English (anime, K-pop, football)> ## Session log (appended automatically by the night worker after every session)

2f. Template — Syllabus skeleton

--- status: pin type: syllabus term: 2026-term-1 updated: 2026-05-11 --- # Term 1, 2026 — B1 English Syllabus ## Term goal <What students will be able to do by end of term — CEFR can-do statement> ## Weekly topics | Week | Date | Grammar focus | Vocab theme | CEFR can-do | |------|------|---------------|-------------|-------------| | 1 | 2026-04-01 | Present simple vs continuous | Daily routines | "I can describe my daily routine in detail" | | 2 | 2026-04-08 | Past simple | Last weekend | "I can describe a past event" | | 3 | 2026-04-15 | Present perfect | Life experiences | "I can talk about my experiences" | | 4 | 2026-04-22 | Past perfect | Storytelling | "I can talk about events before another past event" | | … | | | | | | 12 | 2026-06-24 | Review + assessment | All | Term-end exam | ## Assessment - Weekly: short quiz (5 min, end of lesson) - Mid-term: 2026-05-20 (B1 mock test) - Final: 2026-06-30 (B1 mock test + speaking interview) ## Required materials - <textbook, workbook, Cambridge B1 handbook…>

💡 Don't aim for perfection here

You'll revise these docs continuously as the bot starts using them and you see what's missing. Get a v1 down in 30 minutes — even rough notes — and improve them over the first few weeks. Empty is worse than imperfect.

Step 3 — Bootstrap the Wiki directory structure

Create the seed layout inside the wiki Docker volume. This is the skeleton the documents from Step 2 slot into.

wiki/ ├── INDEX.md # the schema file — read first by every tool ├── school/ │ └── profile.md # from Step 2b ├── staff/ │ ├── _template.md │ └── <name>.md # from Step 2d, one file per teacher ├── students/ │ ├── _template.md # from Step 2e │ └── <nickname>.md # one file per student ├── pedagogy/ │ ├── expectations.md # from Step 2c — the keystone doc │ ├── dss-ppp.md # PPP framework reference │ └── cefr-b1-can-do.md ├── syllabus/ │ └── 2026-term-1.md # from Step 2f ├── lessons/ │ ├── imported/ # bulk-converted past plans (Step 4a) │ └── week-01/ │ ├── article.md # STORM-style long-form article │ ├── plan.md # the DSS-PPP plan │ └── drills.md ├── exams/ │ ├── _template.md │ └── <id>.md ├── sources/ # cited public references (textbooks, CEFR docs) │ └── _readme.md ├── audio/ # filled by Step 6 (NBLM bootstrap) ├── video/ # filled by Step 6 ├── slides/ # filled by Step 6 ├── quizzes/ # filled by Step 6 ├── flashcards/ # filled by Step 6 ├── maps/ # filled by Step 6 └── _archive/ # pruned pages land here, never deleted

The single most important file is INDEX.md. It's what LlamaIndex reads first to know what's in the wiki, and what Hermes reads to decide how to maintain it:

# INDEX.md — Kru Eng Wiki schema ## How to use this wiki 1. Read INDEX.md first (this file). 2. Always read pedagogy/expectations.md early — it defines how this school teaches. 3. When teaching, also load students/<name>.md for the student in front of you. 4. After every session, append a dated note to students/<name>.md summarizing what was practiced and where the student struggled. 5. Lesson DRAFTs live in lessons/week-NN/ with status: DRAFT frontmatter. Promote to status: live only after a teacher review. 6. Never modify files under sources/ — they are cited references, not curated notes. ## Page types - school/profile.md — about the school itself (one file) - pedagogy/expectations.md — how this school teaches (THE keystone doc) - pedagogy/<topic>.md — teaching framework references (DSS-PPP, CEFR descriptors) - staff/<name>.md — per-teacher profile - students/<name>.md — per-student notes (level, weak areas, session log) - syllabus/<term>.md — the course outline for one term - lessons/week-NN/article.md — STORM-generated long-form article on the week's topic - lessons/week-NN/plan.md — DSS-PPP lesson plan - lessons/week-NN/drills.md — practice items + answer keys - exams/<id>.md — exam questions, answer key, rubric, postmortem - sources/<name>.md — extracted text from a cited reference work - audio|video|slides|quizzes|flashcards|maps/week-NN.<ext> — pre-generated media ## Frontmatter contract Every page MUST start with YAML frontmatter: --- status: live | DRAFT | pin | archive type: school | staff | student | pedagogy | syllabus | lesson | exam | source updated: 2026-05-11 --- - status: pin — never auto-prune (school, staff, expectations, syllabus, exam keys) - status: DRAFT — bot-authored, awaiting teacher review - status: live — approved, in active use - status: archive — kept for history, not retrieved by LlamaIndex

Step 4 — Import your documents into the Wiki

Now slot the docs from Step 2 into the structure from Step 3. Three sub-tasks: bulk-convert existing files, fill in the templates, and run the roster script if you have a CSV.

4a. Bulk-convert existing files to markdown

# pandoc handles .docx, .odt, .html, and most PDFs apt install pandoc # Linux brew install pandoc # macOS # Convert one syllabus document: pandoc syllabus_2026.docx -o wiki/syllabus/2026-term-1.md # Bulk-convert a folder of past lesson plans: for f in ~/lesson_plans/*.docx; do name=$(basename "$f" .docx) pandoc "$f" -o wiki/lessons/imported/$name.md done # For exam PDFs, pdftotext often beats pandoc: pdftotext -layout midterm_b1.pdf wiki/exams/midterm-b1.md # For .pptx slide decks (your existing teacher slides): for f in ~/slides/*.pptx; do pandoc "$f" -o wiki/lessons/imported/$(basename "$f" .pptx).md done

4b. Drop the prepared docs from Step 2 into the right folders

# From Step 2 templates — save each one in place: wiki/school/profile.md # Step 2b — school profile wiki/pedagogy/expectations.md # Step 2c — KEY doc wiki/staff/<each-teacher>.md # Step 2d, one per teacher wiki/students/<each-student>.md # Step 2e, one per student wiki/syllabus/2026-term-1.md # Step 2f

4c. Bulk-import the student roster from a CSV

If you already have a class spreadsheet with nickname, Thai name, level, age — this script saves an hour of typing:

# bootstrap_students.py — run once after editing students.csv import csv from pathlib import Path from datetime import date TEMPLATE = Path("wiki/students/_template.md").read_text(encoding="utf-8") with open("students.csv", encoding="utf-8") as f: for row in csv.DictReader(f): out = Path(f"wiki/students/{row['nickname'].lower()}.md") if out.exists(): continue # never overwrite existing notes body = TEMPLATE \ .replace("<Nickname>", row["nickname"]) \ .replace("<Full name in Thai>", row["thai_name"]) \ .replace("A2 / B1 / B2 / C1", row["level"]) \ .replace("e.g. 14 / M.3", row["age_grade"]) \ .replace("2026-05-11", date.today().isoformat()) out.write_text(body, encoding="utf-8") print(f"created {out}")

Step 5 — Pull the AI models

# Day-side chat model (~4.4 GB) docker compose exec ollama ollama pull qwen2.5:7b # Embeddings model for LlamaIndex (~280 MB) docker compose exec ollama ollama pull nomic-embed-text # Night-side agent model (~22 GB Q4_K_M) — skip if no GPU yet docker compose exec ollama ollama pull hermes-4-35b-a3b:q4_k_m # Verify docker compose exec ollama ollama list

Step 6 — One-shot NotebookLM bootstrap (then never again)

NotebookLM has exactly one job in this stack and it happens once: at install time, it generates a rich starter set of media — audio overviews, video summaries, slide decks, quizzes, flashcards, and mind maps — from the public reference works in your curriculum. After this step the Docker host disconnects from the internet and never talks to NotebookLM again. You get NotebookLM's best output (especially Audio Overviews) without an ongoing privacy cost.

ArtifactNotebookLM featureLands inUsed for
🎧 Audio overviewsAudio Overview (podcast-style)wiki/audio/week-NN.mp3Listening practice between sessions
🎬 Video summariesVideo Overviewwiki/video/week-NN.mp4Visual review, homework replay
📊 Slide decksStudy Guide → markdown outlinewiki/slides/week-NN.mdIn-class presentation
📝 Quiz banksFAQ generation (~20 Q&A per topic)wiki/quizzes/week-NN.mdPractice quizzes, exam prep
🎴 Flashcard setsBriefing doc → term extractionwiki/flashcards/week-NN.jsonVocabulary drilling (Anki-importable)
🗺 Mind mapsMind Map → SVGwiki/maps/week-NN.svgConcept reference for visual learners
# bootstrap_notebooklm.py — runs ONCE during install, never again # Generates a full starter set of media from PUBLIC reference works only. from notebooklm import NotebookLM from pathlib import Path import json nb = NotebookLM(cookies_path=Path("~/.config/nblm/cookies.json").expanduser()) WEEKS = [ (1, "Present simple vs present continuous", ["sources/cambridge_b1_ch1.pdf"]), (2, "Past simple", ["sources/cambridge_b1_ch2.pdf"]), (3, "Present perfect", ["sources/cambridge_b1_ch3.pdf"]), # … 12 weeks total — one row per week of your syllabus ] for week, topic, sources in WEEKS: # Hard guard: never let student data near NotebookLM for s in sources: assert "students" not in s and "exams" not in s, \ f"REFUSED: {s} looks like private school data" notebook = nb.create_notebook(name=f"Bootstrap Week {week:02d} — {topic}") for s in sources: notebook.add_source(Path(s)) base = Path("wiki") print(f"[week {week:02d}] generating 6 artifact types…") notebook.audio_overview(focus=f"B1 Thai learners, week {week}: {topic}") \ .download_to(base / f"audio/week-{week:02d}.mp3") notebook.video_overview() \ .download_to(base / f"video/week-{week:02d}.mp4") (base / f"slides/week-{week:02d}.md").write_text( notebook.study_guide().markdown, encoding="utf-8") (base / f"quizzes/week-{week:02d}.md").write_text( notebook.faq(count=20).markdown, encoding="utf-8") cards = notebook.briefing_doc(format="vocab-cards").extract_terms() (base / f"flashcards/week-{week:02d}.json").write_text( json.dumps(cards, ensure_ascii=False, indent=2), encoding="utf-8") notebook.mind_map().save_svg(base / f"maps/week-{week:02d}.svg") print("Bootstrap complete. Now go to Step 8 and disconnect from the internet.")

⛔ The sources list MUST be public

Every file in WEEKS ends up on Google's servers. Allowed: Cambridge handbooks, OUP textbooks, ministry curricula, Council of Europe references, anything you could legally hand to a stranger. Forbidden: the student data and lesson expectations you wrote in Step 2 — those NEVER touch NotebookLM. The assert on line 16 catches the obvious mistakes, but the real check is human judgment before you list the sources.

Step 7 — First conversation + verify citations

After importing and bootstrapping, trigger LlamaIndex to build its vector store, then test:

# Build the LlamaIndex vector store (runs once on first orchestrator start) curl -X POST http://localhost:42110/api/index/update # First chat through the orchestrator: curl -X POST http://localhost:8000/chat \ -H "Content-Type: application/json" \ -d '{"message": "How does our school approach error correction?"}' # Expected: an answer citing pedagogy/expectations.md from Step 2c. # If you get a generic LLM answer with no citation, the LlamaIndex vector store didn't build — # check `docker compose logs orchestrator` for the index-build progress message.

💡 The "first citation" test

The cleanest sanity check: ask the bot about a specific student by nickname. If it answers with information that only exists in wiki/students/<name>.md, the whole pipeline (orchestrator → LlamaIndex → wiki volume → markdown file) is wired correctly. If it returns "I don't have information about Pim," LlamaIndex didn't index the file — usually a volume mount issue or RAG_ENABLED not set.

Step 8 — Disconnect from the internet, then schedule the night worker

NotebookLM's one-time job is finished. From here, the bot operates fully offline forever. Two things to do:

8a. Sever the NotebookLM credentials and outbound network

# Revoke NotebookLM credentials — there's nothing left to authenticate rm -i ~/.config/nblm/cookies.json docker compose exec hermes-night rm -f /data/secrets/nblm_cookies.json # The runtime tool list in hermes-night/run_night.py already excludes # any notebooklm tool — confirm with: grep -i notebooklm hermes-night/run_night.py # (the bootstrap script is in a separate one-shot container and is now retired) # Cut Docker off from outbound internet — recommended on Linux: docker network create --driver bridge --internal kru-eng-internal # Edit docker-compose.yml — under each service, add: # networks: [kru-eng-internal] # Then: docker compose down && docker compose up -d # Verify isolation: docker compose exec orchestrator curl -m 3 https://www.google.com # expected: curl: (28) Connection timed out — confirmed fully offline # Optional belt-and-braces — block outbound at the host firewall too: sudo iptables -I DOCKER-USER -o eth0 -j REJECT # Linux

🔒 What "fully offline" buys you

After this step, the bot literally cannot leak student data — not by accident, not by a misconfigured tool, not by an agent's bad decision. Every layer (Whisper, LlamaIndex, Ollama, Hermes, STORM, XTTS) runs against local files on local network. The only path data takes is mic → orchestrator → wiki → speaker, all on one machine. PDPA compliance becomes a property of the network topology, not a policy you have to enforce.

8b. Schedule the night worker

Host OSSchedulerSetup command
Linux cron crontab -e → add:
0 2 * * * cd /opt/kru-eng-bot/docker && docker compose run --rm hermes-night
macOS launchd or cron Same crontab line works; or create a ~/Library/LaunchAgents/ai.krueng.night.plist
Windows Task Scheduler Action: docker.exe · Args: compose -f C:\path\docker-compose.yml run --rm hermes-night · Trigger: daily at 2:00 AM

Run it manually once to confirm before scheduling:

docker compose run --rm hermes-night # expect ~5-15 minutes on first run (no transcripts yet, just indexing) # subsequent nights with real session data: 20-60 minutes

8c. Wire LINE bot — owner control from your phone (Thailand-friendly)

LINE is the default messenger for most teachers in Thailand. Wiring the owner-bridge service to a LINE bot turns your phone into a remote control for the night worker — DM commands like "draft Week 5", "show today's drafts", "approve all", or "what did Pim practice yesterday?" — and Hermes responds from inside your firewall. Only you (the owner LINE user ID) can talk to it.

StepWhereWhat to do
1. Create LINE channeldevelopers.line.biz/console"Create a new provider" → "Messaging API channel" — name it "Kru Eng Owner"
2. Grab credentialsChannel settingsCopy Channel access token and Channel secret
3. Get your LINE user IDLINE appAdd the bot as friend, send "myid" — bot DMs your user ID back (use it as OWNER_LINE_ID)
4. Expose the webhookYour router / Cloudflare TunnelMake localhost:8082/webhook/line reachable from the public internet (Cloudflare Tunnel is free + no port-forward)
5. Set the webhook URLLINE Developers Console"Webhook URL" → https://<your-tunnel>/webhook/line
# .env additions for the owner-bridge service: LINE_CHANNEL_ACCESS_TOKEN=<token from step 2> LINE_CHANNEL_SECRET=<secret from step 2> OWNER_LINE_ID=<your user ID from step 3> # Restart the bridge to pick up new env docker compose restart owner-bridge # Test from LINE app: DM the bot "status" # Expected reply within a few seconds: "✅ Kru Eng online. 18 students. Last night job: success at 02:31."

Commands the owner-bridge understands by default (extendable in owner-bridge/commands.py):

DM the botWhat it does
statusHealth check: how many students, last night-job result, model status
draft week 5Trigger Hermes to draft Week 5 lesson packet on demand
draftsList all status: DRAFT pages awaiting review, with summaries
approve lessons/week-04/plan.mdFlip a DRAFT page to status: live
student PimShow Pim's latest session log + level + weak areas
quiz week 3Generate a fresh quiz on Week 3's topic using STORM
summaryReplay the latest morning report
helpList all available commands

⚠️ The "outbound-blocked" rule still applies

You set up --internal network isolation in Step 8a. LINE webhooks come inbound through the Cloudflare Tunnel — that's allowed. But the owner-bridge service itself should NOT make outbound calls to LINE's API; instead it uses the LINE reply-token mechanism (replies are scoped to the inbound webhook and don't need outbound). Confirm by reviewing owner-bridge/main.py — it should only call httpx against LINE in response to a webhook, never on its own.

8d. Or: wire Slack bot — for international schools / multi-teacher setups

If your school runs on Slack instead of LINE, the owner-bridge supports both. Same architecture, different webhook path.

StepWhereWhat to do
1. Create Slack appapi.slack.com/apps"Create New App" → "From scratch" — name it "Kru Eng"
2. Enable Events APIApp settingsSubscribe to app_mention and message.im events
3. Get signing secret + bot token"Basic Information" + "OAuth"Install to workspace, copy Bot User OAuth Token (xoxb-…) and Signing Secret
4. Expose webhookCloudflare Tunnel (same as LINE)Map https://<tunnel>/webhook/slack
5. Configure Event URLApp settings"Event Subscriptions" → Request URL: https://<tunnel>/webhook/slack
# .env additions: SLACK_BOT_TOKEN=xoxb-<token> SLACK_SIGNING_SECRET=<secret> OWNER_SLACK_USER_ID=<U01234567 — your Slack user ID, find with /shrug profile> docker compose restart owner-bridge # Test: @KruEng status # Expected: same status reply as LINE

💡 You can wire both at once

The owner-bridge service routes by webhook path (/webhook/line vs /webhook/slack) and supports a single owner across both channels. Use LINE for personal phone notifications and Slack when you're working at the school. The same commands work in both. Email + LINE + Slack are all simultaneous output channels for the morning report.

8e. Owner control architecture diagram

How the owner controls the bot remotely — webhook-in, never poll-out 📱 LINE app (phone) teacher DMs commands 💬 Slack workspace @KruEng mentions 📧 Email inbox morning reports only (out) (outbound: report only) ☁ Cloudflare Tunnel inbound webhooks only no port-forward · TLS terminated 🌉 owner-bridge webhook router · auth · commands localhost:8082 🤖 Hermes Agent + Web UI :8081 runs tools, returns text 📚 wiki/ + 🦙 LlamaIndex + 🌪 STORM all local · all offline · no internet Hermes calls these to satisfy commands
LINE and Slack send webhooks inbound through a Cloudflare Tunnel. The bot's responses ride back on that same connection. The Docker host never initiates outbound calls except for the SMTP morning report.

🔒 Why this design preserves the "offline" property

Step 8a cut outbound internet from Docker. LINE and Slack control still works because webhooks are inbound — Cloudflare Tunnel accepts an incoming HTTPS connection from LINE's servers, forwards it to owner-bridge over the tunnel's reverse connection. The reply travels back on the same socket. No outbound DNS, no outbound TCP. The only deliberate exception is the morning SMTP email, which is one allowlisted outbound rule. Everything else stays sealed.

🎯 You're done. What happens next?

  1. Tomorrow's first session — student talks to the bot. LlamaIndex grounds the answer in the school profile, lesson expectations, syllabus, and any imported lesson plans you prepared in Step 2. The orchestrator writes the transcript to transcripts/.
  2. Tomorrow night at 2 AM — Hermes wakes up. Reads the transcript. Appends a session note to that student's page. Re-embeds the vector index. Emails you a summary.
  3. The day after — when that student returns, the bot already knows what they practiced yesterday and still respects every rule in pedagogy/expectations.md. The Wiki has compounded by one session.

From here, every interaction enriches the Wiki, every night Hermes maintains it, and once a week you spend 15 minutes reviewing DRAFTs Hermes proposed. That's the steady state.

🎓 Picking what to build first

Your situationStart withAdd next
Single teacher, ~20 students, stable syllabus📚 Wiki + 🦙 LlamaIndex (Getting Started 1–4)Add Hermes nightly worker when ready
Several classes, want exam grading / lesson generation📚 Wiki + 🤖 Hermes + STORMNotebookLM one-shot bootstrap for media
Whole-school deployment, hundreds of docs🦙 LlamaIndex (or Danswer for multi-source)Wiki for live curation, Hermes nightly
Need rich starter media (audio/video/slides)📓 NotebookLM install bootstrap (Step 6)Then disconnect — never call NotebookLM again
Want the bot to author new lessons🤖 Hermes + 🌪 STORM + draft toolsWiki as the staging area for review
PDPA-sensitive student data🦙 AnythingLLM or 🦙 LlamaIndex (in-orchestrator)Skip cloud tools entirely

🎯 The shortest path to value

If you only have an afternoon, follow Getting Started Steps 1–4 (pull Docker, prepare docs, bootstrap wiki, import) and skip everything after. You'll have a bot that grounds answers in your school's actual documents within an hour — even without Hermes, STORM, or the NotebookLM bootstrap. Add the nightly worker (Steps 5 + 8) the next week, and the NotebookLM media pack (Step 6) once you're sure about your public source list. The Wiki + LlamaIndex alone is already most of the value.