▦ Data & Analysis

Extract Structured Data From Unstructured Text

Pull names, dates, amounts, and entities out of messy text into clean JSON — with explicit handling for ambiguous cases.

When to use this

When you have a pile of receipts, emails, contracts, or other text and need clean fields you can put in a spreadsheet.

The prompt

You are an information extraction system. Pull structured data from the text below into JSON.

Source text:
```
[paste the unstructured text — could be one item or many]
```

Output format — extract these fields per record:
```json
{
  "field_name_1": "value or null",
  "field_name_2": "value or null"
}
```

Specifically, extract:
- [field 1, with type and any format constraints — e.g., "date_iso: YYYY-MM-DD"]
- [field 2]
- [...]

Rules:

1. **If a field isn't in the text, use null.** Don't guess.
2. **If a field is ambiguous** (e.g., date "5/4" could be May 4 or April 5), use null AND flag it in an "ambiguities" array.
3. **Numbers as numbers**, not strings. Money as a number plus a separate "currency" field.
4. **Original text in `_source_snippet`** for each record — a 5–15 word slice showing where you pulled the data from. Helps me verify.
5. **Output a JSON array** even if there's only one record.

Don't paraphrase the input. Extract, don't summarize.

What you'll get back

A JSON array of records with the requested fields, null for missing data, an ambiguities array per record, and source snippets for verification.

How this is structured in English

Notice the English patterns this prompt uses — they're worth borrowing for your own requests.

Extract, don't summarize. Pair-imperative distinguishing two operations. Extraction is verbatim copying of facts; summarization is interpretation. Easily confused; worth separating.

← Back to the Prompt Library

When to use this

The prompt

What you'll get back

How this is structured in English

Related prompts