🦙 Ollama — Run AI Models on Your Own PC

🤔 What is Ollama?

🇹🇭 ภาษาไทย

Ollama คือซอฟต์แวร์ที่ให้คุณรันโมเดล AI ภาษา (เช่น Llama, Qwen, Mistral, Gemma) บนเครื่องคอมพิวเตอร์ของคุณเอง โดยไม่ต้องส่งข้อมูลขึ้นคลาวด์

มันทำหน้าที่เป็น "เครื่องยนต์" — ดาวน์โหลดโมเดลจากอินเทอร์เน็ตครั้งเดียว แล้วเก็บไว้ในเครื่อง หลังจากนั้นคุณคุยกับมันได้แม้ไม่มีอินเทอร์เน็ต

ใช้งานผ่านบรรทัดคำสั่ง (ollama run qwen2.5) หรือ REST API (POST /api/chat) ฟรีและเป็นโอเพนซอร์ส

🇬🇧 English

Ollama is software that lets you run large language models (Llama, Qwen, Mistral, Gemma, and many others) entirely on your own computer. No data goes to the cloud.

Think of it as the engine: it downloads model files once, stores them locally, then serves them to you forever — even without internet.

Use it from the command line (ollama run qwen2.5) or via a REST API (POST /api/chat). Free and open source.

🇨🇳 中文

Ollama 是一款让你在自己的电脑上运行开源大语言模型（Llama、Qwen、Mistral、Gemma 等）的软件。所有数据都不会上传到云端。

可以把它想象成"引擎"：从网上下载一次模型文件，保存到本地，之后即使没有网络也能继续用它。

通过命令行（ollama run qwen2.5）或 REST API（POST /api/chat）使用。免费且开源。

🦙 Analogyเปรียบเทียบ类比

Ollama is to AI models what Docker is to web services. Docker doesn't make web apps — it makes them easy to run. Ollama doesn't train AI models — it makes them easy to run on any laptop with a single command.

Ollama เปรียบเหมือน Docker สำหรับโมเดล AI — Docker ไม่ได้สร้างแอป แต่ทำให้รันแอปง่ายขึ้น Ollama ไม่ได้ฝึกโมเดล แต่ทำให้รันโมเดลที่คนอื่นฝึกไว้แล้วง่ายมาก แค่คำสั่งเดียว

Ollama 之于 AI 模型，就像 Docker 之于 Web 应用。Docker 不创造应用，它让应用容易运行。Ollama 不训练模型，它让训练好的模型在任何电脑上一行命令就能跑起来。

Visualization 1 — Cloud LLMs send every prompt across the internet. Ollama keeps every byte on your machine. ภาพประกอบ 1 — Cloud LLM ส่งทุก prompt ออกไปทางอินเทอร์เน็ต. Ollama เก็บทุกอย่างไว้ในเครื่อง 图 1 — 云端 LLM 把每个 prompt 都送到互联网。Ollama 把所有数据留在本机。

⚙️ How Ollama works

🇹🇭 ภาษาไทย

Ollama มี 3 ชั้น ทำงานร่วมกัน:

1. ส่วนติดต่อ — CLI (ollama run) หรือ REST API (พอร์ต 11434) สำหรับแอปอื่นเรียกใช้

2. รันไทม์ — โหลดโมเดลเข้า RAM/VRAM, จัดคิวคำขอ, ใช้ GPU ถ้ามี (รองรับ NVIDIA, Apple Silicon, AMD)

3. ไฟล์โมเดล — รูปแบบ GGUF (โมเดลที่ "อัด" แล้ว) เก็บอยู่ที่ ~/.ollama/models/

🇬🇧 English

Ollama has three layers working together:

1. Interface — CLI (ollama run) and a REST API on port 11434 that any app can call.

2. Runtime — loads the chosen model into RAM (and VRAM if a GPU is present), queues requests, manages batching. Supports NVIDIA, Apple Silicon, and AMD.

3. Model files — quantized models in GGUF format, stored under ~/.ollama/models/. A 3B-parameter model is around 2 GB on disk.

🇨🇳 中文

Ollama 由三层协同工作：

1. 接口层 — 命令行（ollama run）和 11434 端口的 REST API，供其他应用调用。

2. 运行时 — 把所选模型加载到 RAM（如有 GPU 则用 VRAM），处理请求队列，支持 NVIDIA、Apple Silicon 和 AMD。

3. 模型文件 — 量化后的 GGUF 格式，存放在 ~/.ollama/models/。一个 3B 参数模型约 2 GB。

Visualization 2 — Top: how you call Ollama. Middle: the runtime that does the work. Bottom: model files sitting on disk. ภาพประกอบ 2 — บน: วิธีเรียกใช้. กลาง: รันไทม์ที่ทำงานจริง. ล่าง: ไฟล์โมเดลที่อยู่ในดิสก์ 图 2 — 顶层：你怎么调用 Ollama。中间层：实际工作的运行时。底层：磁盘上的模型文件。

🔄 The inference flow — what happens per request

🇹🇭 ภาษาไทย

เมื่อคุณส่งคำถาม Ollama จะทำงานตามลำดับนี้:

รับคำขอผ่าน HTTP (พอร์ต 11434)
โหลดโมเดลเข้า RAM (เฉพาะครั้งแรก, ครั้งต่อ ๆ ไปอยู่ในแคช)
แปลงข้อความเป็น token
สร้าง token ทีละตัว ส่งกลับเป็น stream ทันที
หยุดเมื่อถึง token สิ้นสุด หรือถึงขีดจำกัด

🇬🇧 English

When you send a prompt, Ollama goes through these steps in order:

Receives the request over HTTP (port 11434)
Loads the model into RAM (only the first time — kept warm afterwards)
Tokenizes your prompt into integer IDs
Generates one token at a time, streaming each one back as it's produced
Stops at an end-of-stream token or when the configured length limit is hit

🇨🇳 中文

当你发送一个 prompt，Ollama 按这些步骤依次执行：

通过 HTTP（11434 端口）接收请求
把模型加载到 RAM （只在第一次加载，之后保持热缓存）
把你的 prompt 切成 token（整数 ID）
一次生成一个 token，边生成边流式返回
遇到结束 token 或达到长度上限时停止

Visualization 3 — One request, traced left-to-right. Streaming means the user sees the first words within ~100ms instead of waiting for the whole answer. ภาพประกอบ 3 — หนึ่งคำขอ ติดตามจากซ้ายไปขวา. การสตรีมทำให้ผู้ใช้เห็นคำแรกใน ~100ms แทนที่จะต้องรอคำตอบทั้งหมด 图 3 — 一次请求，从左到右追踪。流式输出让用户在约 100 毫秒内看到首个词，不用等整段回答。

🎯 Models you can run — and what hardware you need

🇹🇭 ภาษาไทย

โมเดลมีหลายขนาด ยิ่งใหญ่ยิ่งฉลาดแต่ยิ่งกินทรัพยากร กฎคร่าว ๆ: ต้องการ RAM เท่ากับขนาดไฟล์โมเดล + เผื่ออีก ~30%

ขนาดยอดนิยม:

1B (เช่น Llama 3.2 1B) — รันได้บนโทรศัพท์ แต่คำตอบไม่ค่อยดี
3B (Qwen 2.5, Llama 3.2) — จุดเริ่มต้นที่ดี ใช้ RAM ~4 GB
7-8B (Hermes 3, Llama 3.1) — คุณภาพดีขึ้นชัด ใช้ RAM ~8 GB
70B+ — คุณภาพระดับ ChatGPT แต่ต้อง VRAM 48+ GB

🇬🇧 English

Models come in sizes. Bigger = smarter but heavier. Rule of thumb: you need roughly as much RAM as the model file is large, plus ~30% headroom.

Common sizes:

1B (e.g. Llama 3.2 1B) — runs on a phone, output quality is modest
3B (Qwen 2.5, Llama 3.2) — sweet spot on a laptop, ~4 GB RAM, useful answers
7-8B (Hermes 3, Llama 3.1) — clearly better quality, ~8 GB RAM, slower on CPU
70B+ — ChatGPT-class answers, but needs 48+ GB of VRAM (workstation/server territory)

🇨🇳 中文

模型有多种大小。越大越聪明，但越占资源。经验法则：所需 RAM ≈ 模型文件大小 + 30% 余量。

常见规格：

1B（如 Llama 3.2 1B）— 手机能跑，回答质量一般
3B（Qwen 2.5、Llama 3.2）— 笔记本上的甜蜜点，约 4 GB RAM，回答足够实用
7-8B（Hermes 3、Llama 3.1）— 质量明显提升，约 8 GB RAM，纯 CPU 较慢
70B+ — 接近 ChatGPT 水平，但需要 48+ GB VRAM（工作站 / 服务器级别）

Visualization 4 — On a typical laptop (8-16 GB RAM), the 3B–7B band is the realistic working range. ภาพประกอบ 4 — บนแล็ปท็อปทั่วไป (RAM 8-16 GB), ช่วง 3B-7B คือขนาดที่ใช้งานได้จริง 图 4 — 普通笔记本（8-16 GB RAM）上，3B–7B 是现实可用的区间。

🔐 Why run AI locally?

☁️ Cloud LLM concernsความเสี่ยงของ Cloud LLM云端 LLM 的隐患

Data leaves your network (PDPA / GDPR exposure)
Per-token billing — costs scale with use
Vendor outages = your tool is down
Vendor can change pricing / deprecate models anytime
Some vendors train on your prompts unless opted out
ข้อมูลออกจากเครือข่ายของคุณ (เสี่ยง PDPA / GDPR)
คิดเงินตาม token — ค่าใช้จ่ายเพิ่มตามการใช้งาน
เซิร์ฟเวอร์ผู้ให้บริการล่ม = เครื่องมือคุณใช้ไม่ได้
ผู้ให้บริการเปลี่ยนราคา / ยกเลิกโมเดลได้ทุกเมื่อ
บางรายเอา prompt ของคุณไปฝึกโมเดล (ถ้าไม่ได้ปิดไว้)
数据离开你的网络（PDPA / GDPR 风险）
按 token 计费 — 用得越多越贵
厂商宕机 = 你的工具也宕了
厂商随时可能涨价或停用模型
有些厂商默认会用你的 prompt 训练（除非手动关闭）

🏠 Local Ollama benefitsข้อดีของ Ollama ในเครื่อง本地 Ollama 的优势

Prompts and replies never leave the machine
$0 per request after the hardware is paid for
Works offline — usable in classrooms with iffy WiFi
You decide when to upgrade or change models
No vendor lock-in — open-weight files you own
Prompt และคำตอบไม่ออกจากเครื่อง
เสียค่าใช้จ่ายฮาร์ดแวร์ครั้งเดียว ใช้ฟรีตลอด
ใช้ได้แม้ไม่มีอินเทอร์เน็ต — เหมาะกับห้องเรียนที่ Wi-Fi ไม่นิ่ง
คุณตัดสินใจเองว่าจะอัปเกรดหรือเปลี่ยนโมเดลเมื่อไหร่
ไม่ผูกขาดผู้ให้บริการ — ไฟล์โมเดลเป็นของคุณ
prompt 和回复完全留在本机
硬件买完之后，每次调用都 $0
离线可用 — 适合 Wi-Fi 不稳定的教室
你自己决定何时升级或切换模型
不被厂商锁定 — 开源权重文件归你所有

Visualization 5 — Numbers are rough (varies wildly with usage). The shape is the point: cloud grows, local flattens after the upfront cost. ภาพประกอบ 5 — ตัวเลขเป็นค่าประมาณ. ประเด็นคือรูปทรง: คลาวด์เพิ่มขึ้นเรื่อย ๆ, ส่วน local จะคงที่หลังจากจ่ายค่าฮาร์ดแวร์เริ่มต้นแล้ว 图 5 — 数字仅供参考（实际差异很大）。重点是形状：云端线性增长，本地一次投入后保持平稳。

🚀 Quick start — three commands

🇹🇭 ภาษาไทย

ติดตั้งและรันโมเดลแรกใน 5 นาที:

ดาวน์โหลดและติดตั้งจาก ollama.com (Mac / Windows / Linux)
เปิด Terminal แล้วรัน ollama run qwen2.5:3b — ครั้งแรกจะดาวน์โหลด ~2 GB
พิมพ์คำถามได้เลย ตอบกลับเป็น stream ทันที

🇬🇧 English

From zero to first reply in about 5 minutes:

Download and install from ollama.com (Mac / Windows / Linux)
Open a terminal and run ollama run qwen2.5:3b — first run downloads ~2 GB
Type a question. The reply streams back token-by-token.

🇨🇳 中文

从零到第一次回复，约 5 分钟：

从 ollama.com 下载安装（Mac / Windows / Linux）
打开终端，运行 ollama run qwen2.5:3b — 第一次会下载约 2 GB
直接输入问题，回复会以流式逐字返回

# Install (Mac) — installs the GUI app + CLI
brew install --cask ollama

# Pull a model (one-time download, lives in ~/.ollama/models/)
ollama pull qwen2.5:3b

# Chat interactively
ollama run qwen2.5:3b

# Or call the REST API from any language
curl http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:3b",
  "messages": [{"role":"user","content":"Hello from Chiang Mai!"}],
  "stream": false
}'

💡 Tip — pair with the kru-eng-classroom stackเคล็ดลับ — ใช้คู่กับ kru-eng-classroom小提示 — 与 kru-eng-classroom 配合使用

Our open-source classroom stack at github.com/ddtraveller/agentTeacher uses Ollama under the hood — chat, lesson, exam, whiteboard, and calendar features all served from one Docker container that talks to your local Ollama.

โปรเจกต์ห้องเรียนโอเพนซอร์สของเราที่ github.com/ddtraveller/agentTeacher ใช้ Ollama เป็นฐาน — ฟีเจอร์แชท บทเรียน ข้อสอบ กระดานวาด และปฏิทินทั้งหมดเสิร์ฟจาก Docker container เดียวที่คุยกับ Ollama ในเครื่อง

我们的开源教室项目 github.com/ddtraveller/agentTeacher 底层就是 Ollama — 聊天、课程、试题、白板和日历功能都由一个 Docker 容器提供，并与本机 Ollama 通信。