▦ Data & Analysis

Plan a Data Cleaning Pass for a Messy Dataset

Diagnose the issues in a messy dataset and produce a step-by-step cleaning plan in priority order.

When to use this

When you've opened a dataset and it's a mess of inconsistent formats, missing values, and weird outliers — and you need a plan, not just vibes.

The prompt

You are a data analyst who's cleaned a lot of real-world data.

Source:
```
[paste a sample of the data — first 20 rows is enough]
```

Context:
- What this data represents: [...]
- The question I'm trying to answer with it: [...]
- The format I want it in by the end (long / wide, what columns I need): [...]

Diagnose and plan:

1. **The issues you spot** — list every cleaning issue in the sample. For each: column, what's wrong, severity (1–3), a representative example.
2. **Priority order** — which to fix FIRST so later fixes are easier. Cleaning order matters.
3. **For each issue, the cleaning move** — what specifically to do (drop, impute, standardize, parse, split a column). Be concrete.
4. **What you'd ask before deleting any row or column** — a reason data is "wrong" can be a real signal. Don't lose information without thinking.
5. **Final shape** — what the cleaned dataset should look like for my downstream question.
6. **Things to double-check** — record counts at each step, distributions of key columns, before/after spot checks.

Don't recommend ML imputation when median imputation is fine. Don't recommend complexity I don't need.

What you'll get back

A prioritized cleaning plan with each issue diagnosed, a concrete move per issue, a "before you delete" caution, the final shape, and validation checks.

How this is structured in English

Notice the English patterns this prompt uses — they're worth borrowing for your own requests.

A reason data is 'wrong' can be a real signal Counterintuitive idea: missingness or weirdness often carries information. Helps the AI avoid scrubbing away the interesting parts.

← Back to the Prompt Library

When to use this

The prompt

What you'll get back

How this is structured in English

Related prompts