Extract Structured Data From Unstructured Text
Pull names, dates, amounts, and entities out of messy text into clean JSON — with explicit handling for ambiguous cases.
When to use this
When you have a pile of receipts, emails, contracts, or other text and need clean fields you can put in a spreadsheet.
The prompt
You are an information extraction system. Pull structured data from the text below into JSON.
Source text:
```
[paste the unstructured text — could be one item or many]
```
Output format — extract these fields per record:
```json
{
"field_name_1": "value or null",
"field_name_2": "value or null"
}
```
Specifically, extract:
- [field 1, with type and any format constraints — e.g., "date_iso: YYYY-MM-DD"]
- [field 2]
- [...]
Rules:
1. **If a field isn't in the text, use null.** Don't guess.
2. **If a field is ambiguous** (e.g., date "5/4" could be May 4 or April 5), use null AND flag it in an "ambiguities" array.
3. **Numbers as numbers**, not strings. Money as a number plus a separate "currency" field.
4. **Original text in `_source_snippet`** for each record — a 5–15 word slice showing where you pulled the data from. Helps me verify.
5. **Output a JSON array** even if there's only one record.
Don't paraphrase the input. Extract, don't summarize.
What you'll get back
A JSON array of records with the requested fields, null for missing data, an ambiguities array per record, and source snippets for verification.
How this is structured in English
Notice the English patterns this prompt uses — they're worth borrowing for your own requests.
- Extract, don't summarize. Pair-imperative distinguishing two operations. Extraction is verbatim copying of facts; summarization is interpretation. Easily confused; worth separating.