Skip to main content
Beta We're in open beta - lock in lifetime access to today's feature set for just £99.99.

Documentation

Everything you need to know about using Simple Data Cleaner to clean and validate your UK data.

Getting Started

Simple Data Cleaner helps you validate and clean UK data formats including phone numbers, National Insurance numbers, postcodes, and bank sort codes. The entire process happens in your browser - your data never leaves your device.

Quick Start:

  1. Upload your CSV, Excel (.xlsx/.xls), or JSON file
  2. Select which columns contain UK data to clean (or use Auto-Select)
  3. Click "Clean My Data" to process
  4. Review results in the Full Preview tab
  5. Customise export options (format, duplicates, whitespace)
  6. Download your cleaned file

Pro Tip: Always use the Full Preview feature before downloading to ensure the cleaning results match your expectations.

Interactive Example

Want to try Simple Data Cleaner without uploading your own file? Visit our Interactive Example page to explore the tool with pre-loaded sample data.

What's Included in the Example:

The example dataset includes 23 rows with various UK data formats, including:

  • Names with titles: "Mr. John Smith", "Mrs. Jane Doe", "Dr. Bob Johnson" - perfect for testing name splitting
  • Phone numbers: Various formats (international, UK mobile, landline) - some valid, some invalid
  • Postcodes: Different formats (lowercase, no spaces, with spaces) that will be cleaned to standard UK format
  • Sort codes: Various formats (no dashes, with spaces, with dots) that will be cleaned to XX-XX-XX format
  • Bank account numbers: 8-digit account numbers that will be validated and formatted
  • National Insurance numbers: Valid and invalid examples
  • Duplicate rows: Rows 21-23 are duplicates of rows 1-3, perfect for testing duplicate detection

How to Use the Example Page:

  1. Load Example Data: Click the "Load Example Data" button to automatically load the sample dataset
  2. Select Columns: Use "Auto-Select" to automatically detect cleanable columns, or manually select which fields to clean
  3. Choose Phone Format: Select whether you want phone numbers formatted as International (+44) or UK (0) format
  4. Clean the Data: Click "Clean My Data" to process the example file
  5. Explore Results:
    • View the Summary tab to see statistics
    • Check the Full Preview tab to see all cleaned data with green highlighting for modified cells
    • Open the Full Preview Report for a detailed HTML view
    • View the Cleaned tab to see what was fixed (postcodes, sort codes, names split, etc.)
    • Check the Issues tab to see validation errors
  6. Export Options: Customise your download format (CSV, Excel, JSON) and options (remove duplicates, trim whitespace, export only rows with issues)
  7. Download: Download the cleaned file to see the results
  8. Reset Demo: Click "Reset Demo" to clear all data and start over - perfect for trying different cleaning options

Perfect for Learning: The example data is specifically designed to demonstrate all features. You'll see postcodes cleaned from lowercase to uppercase, sort codes formatted from various formats to XX-XX-XX, names split into title/first_name/last_name, and duplicate rows highlighted.

Reset Demo Button: The "Reset Demo" button clears all loaded data and results, allowing you to start fresh and try different cleaning options. This is especially useful for experimenting with different field selections, phone number formats, and export options without having to reload the page.

Supported File Formats

CSV

Comma Separated Values

  • • Extension: .csv
  • • Universal compatibility
  • • Best for spreadsheets

Excel

Microsoft Excel Format

  • • Extensions: .xlsx, .xls
  • • Native Excel format
  • • Preserves formatting

JSON

JavaScript Object Notation

  • • Extension: .json
  • • Perfect for developers
  • • Structured data format

🔄 Format Conversion: Upload in any format, download in any format. For example, upload a CSV file and download as Excel or JSON!

Upload Process

File Size Limits

File size limits depend on your browser and device memory:

  • Free users: Files up to several hundred MB
  • Premium users: Larger files supported (browser-dependent)
  • Processing time increases with file size

File Requirements

  • File must have a header row (column names)
  • CSV files should be properly formatted with consistent delimiters
  • Excel files: Only the first sheet is processed
  • JSON files should contain an array of objects with consistent keys

Privacy Guarantee

100% Private: Your file is processed entirely in your browser using JavaScript. No data is transmitted to our servers, stored in any database, or accessed by third parties. We never see your data.

Field Selection

Supported UK Data Types

Phone Numbers

UK mobile and landline numbers

Accepts:

  • 07123456789
  • +44 7123 456789
  • 0207 123 4567
  • (020) 7123 4567

National Insurance

UK NI numbers

Accepts:

  • AB123456C
  • AB 12 34 56 C
  • ab-123456-c

Postcodes

UK postal codes

Accepts:

  • SW1A 1AA
  • sw1a1aa
  • M1 1AA
  • EC1A 1BB

Sort Codes

UK bank sort codes

Accepts:

  • 12-34-56
  • 123456
  • 12 34 56

Full Names (Split)

Split full names into separate columns

Creates:

  • title (Mr, Mrs, Dr, etc.)
  • first_name
  • last_name

Example:

  • "Mr. John Smith" → title: "Mr", first_name: "John", last_name: "Smith"

Auto-Select Feature

Click the "Auto-Select" button to automatically detect and select all cleanable columns based on:

  • Column names: Matches keywords like "phone", "mobile", "postcode", "ni_number", "sort_code", "name", "full_name"
  • Data content: Analyzes sample values to identify UK data patterns and name formats
  • Smart suggestions: Recommends the appropriate data type for each column, including name splitting for full name columns

Manual Selection

You can manually select columns and choose the validation type:

  1. Check the box next to each column you want to clean
  2. Select the data type from the dropdown (Phone, NI Number, Postcode, Sort Code)
  3. For name columns, check the "Split into first_name and last_name" option to split full names
  4. The system will validate and clean based on your selection

Name Splitting: When you select a name column for splitting, it creates three new columns: title (extracted titles like Mr, Mrs, Dr), first_name, and last_name. The original name column is preserved unless you choose to remove it during export.

Protected Columns

Protected columns are automatically detected and never modified during the cleaning process. This ensures your data relationships and unique identifiers remain intact.

What Gets Protected?

Columns matching these patterns are automatically protected:

id
*_id
*_key
pk
*_number
reference

Examples: customer_id, order_number, transaction_key, account_id, reference_code

Visual Indicators

  • Protected columns show a lock icon in the field selection area
  • Lock icon appears in the Full Preview table headers
  • Listed in the download summary under "Protected Columns"

Why Protected? ID and key columns often contain unique identifiers, order numbers, or reference codes that have specific meaning in your systems. Modifying these could break data relationships or cause integration issues.

Data Profiling

After processing your file, you'll see a Data Profiling section with key insights about your data quality:

Missing Values

Count of empty or null cells across all columns

Duplicate Rows

Count of exact duplicate rows found in your data

Unique Rows

Count of unique rows (total minus duplicates)

Missing Values by Column

If your file has missing values, you'll see a detailed breakdown table showing:

  • Which columns have missing data
  • Count of missing values per column
  • Percentage of rows affected

This helps you identify data quality issues and decide if you need to fill in missing values before using the data.

Full Preview Feature

The Full Preview tab shows you exactly what your downloaded file will contain. This gives you complete confidence before downloading.

What You'll See:

  • All rows and columns from your original file
  • Cleaned values applied to validated fields
  • Colour coding to highlight changes and issues
  • Protected columns marked with lock icons
  • Row and column counts at the top

Preview Benefits: The Full Preview eliminates surprises. You can verify the cleaning results, check for any issues, and ensure the output matches your expectations before committing to a download.

Full Preview HTML Report

In addition to viewing the preview in the tab, you can open a dedicated HTML report page for a better viewing experience:

  1. After processing your file, go to the "Full Preview" tab
  2. Click the "View Full Preview Report" button
  3. A comprehensive HTML report opens in a new page with the complete cleaned dataset
  4. The report includes highlighting for duplicate rows and modified cells
  5. You can print or bookmark this report page for easy reference

Colour Coding System

The Full Preview uses a colour coding system to help you quickly identify different types of data and changes:

Blue Rows

Original rows that have duplicates elsewhere in the file

Badge: "HAS DUPLICATES"

Meaning: This is the first occurrence - it will be kept even if "Remove duplicates" is checked

Yellow Rows

Duplicate rows (exact copies of earlier rows)

Badge: "DUPLICATE"

Meaning: This row will be removed if you check "Remove duplicate rows"

Green Cells

Individual cells that were cleaned, validated, or fixed

Meaning: This specific value was modified by the validation process (e.g., phone number formatted to +44 format)

White/Gray Rows

Normal rows with no duplicates (alternating white and light gray for readability)

Meaning: Standard row with no special status

Example Preview:

Row 1: John Doe | +44 7123 456789 ← Normal row, phone cleaned (green)
Row 2: Jane Smith | [empty] ← Normal row
HAS DUPLICATES Row 3: Bob Johnson | +44 7987 654321 ← Blue (has duplicate at row 6)
DUPLICATE Row 4: John Doe | +44 7123 456789 ← Yellow (duplicate of row 1)
Row 5: Alice Brown | +44 7111 111111 ← Normal row
DUPLICATE Row 6: Bob Johnson | +44 7987 654321 ← Yellow (duplicate of row 3)

Duplicate Row Handling

How Duplicates Are Detected

The system identifies exact duplicate rows by comparing all column values:

  • Two rows are duplicates if all column values match exactly
  • The first occurrence is considered the "original"
  • Subsequent identical rows are marked as "duplicates"
  • Case-sensitive comparison (unless cleaned values are used)

Removal Options

In the Export Options section, you'll find a checkbox: "Remove Duplicate Rows (Keep First Occurrence)"

✓ Checked (Remove)

  • • Only first occurrence kept
  • • Duplicate rows excluded from download
  • • Row count reduced
  • • Data relationships preserved

○ Unchecked (Keep All)

  • • All rows included in download
  • • Both originals and duplicates kept
  • • Original row count maintained
  • • Useful for audit purposes

Visual Preview

Before removing duplicates, always check the Full Preview tab to verify:

  • Blue rows show which records will be kept (originals)
  • Yellow rows show which records will be removed (duplicates)
  • Duplicate count shown in the Export Options section
  • Total affected rows count displayed (originals + duplicates)

Important: Duplicate removal is applied AFTER data cleaning. This means the system will detect duplicates based on the cleaned, normalized values - not the original raw data.

Export Options

Download Format Selection

Choose your preferred download format:

CSV

  • • Universal compatibility
  • • Opens in any spreadsheet
  • • Smallest file size

Excel (.xlsx)

  • • Native Excel format
  • • Ready to use in Excel
  • • Preserves data types

JSON

  • • Developer-friendly
  • • Structured format
  • • Easy to parse

Data Cleaning Options

Remove Duplicate Rows

Automatically removes exact duplicate rows, keeping only the first occurrence. Duplicates are highlighted in the preview so you can review before removing.

Clean Whitespace in All Cells

Trims leading/trailing spaces and normalizes multiple spaces to single spaces across all cells. This ensures consistent formatting. Enabled by default.

Additional Export Options

Include "Issues" Column

Adds an extra column listing any validation issues found in each row. Helps you quickly identify which rows may need manual review.

Export Only Rows With Issues

Downloads ONLY the rows that have validation problems. Perfect for creating a focused review file that you can fix manually and merge back into your main dataset.

Detailed Issues Report

Get comprehensive, actionable explanations for every validation issue. No more guessing what's wrong with your data - we tell you exactly what the problem is and how to fix it.

How It Works

  1. After processing your file, go to the "Issues" tab
  2. Click the "View Issues Report" button
  3. A comprehensive HTML report opens in a new page with detailed explanations for each issue
  4. Each issue shows: the invalid value, specific problem, explanation, and actionable guidance
Report Features:
  • Dedicated HTML Page: Opens in a separate page for better viewing and printing
  • Persistent Storage: Reports survive page refreshes. Your browser's localStorage keeps the report available even after refreshing.
  • Print & Download: Use the print button to print the report, or download it as an HTML file for permanent storage and sharing.
  • Navigation Sidebar: Quick links to jump to specific data type sections (NI Numbers, Phone Numbers, Postcodes, etc.)
  • Detailed Explanations: Each issue includes specific problem identification, why it's invalid, and actionable steps to fix it

What You Get

  • • Specific problem identification
  • • Clear explanations of why values are invalid
  • • Actionable guidance on how to fix issues
  • • Professional HTML report format

Example Explanations

  • • NI Numbers: Which letter is invalid and why
  • • Phone Numbers: Format issues and corrections
  • • Postcodes: Missing spaces or wrong format
  • • Sort Codes: Wrong digit count or letters

Pro Tip: The detailed issues report is perfect for sharing with your team or documenting data quality issues. Each explanation includes the specific problem, why it's invalid according to UK standards, and clear steps to fix it.

📌 Important: The report is stored in your browser's session storage, so if you refresh the page, it will automatically restore. You can also print the report directly from the browser or download it as an HTML file for permanent storage.

Cleaned Data Report

View a comprehensive report of all values that were automatically cleaned during processing. See the original values alongside their cleaned versions, organised by data type.

How It Works

  1. After processing your file, go to the "Cleaned" tab
  2. Click the "View Cleaned Report" button
  3. A comprehensive HTML report opens in a new page showing all cleaned values
  4. Values are organised by type (Phone Numbers, Postcodes, NI Numbers, Sort Codes, Name Splits, etc.)
  5. Each entry shows the original value and the cleaned/fixed value side by side
Report Features:
  • Dedicated HTML Page: Opens in a separate page for better viewing and printing
  • Organised by Type: Values are grouped by data type (Phone Numbers, Postcodes, etc.) for easy navigation
  • Before & After: See original values alongside their cleaned versions
  • Navigation Sidebar: Quick links to jump to specific data type sections
  • Persistent Storage: Reports survive page refreshes via browser localStorage
  • Print & Share: Perfect for documenting your data cleaning process and sharing with your team

Pro Tip: The cleaned data report is excellent for auditing your data cleaning process, tracking what changes were made, and documenting your data quality improvements for stakeholders.

Important: Your cleaned data only exists in your browser session. Once you close the tab or refresh the page, it's gone. Make sure to download and save your file immediately after processing!

Validation Rules

Phone Number Validation

Accepts:

  • • UK mobile numbers (07xxx xxxxxx)
  • • UK landline numbers (01xxx / 02xxx / 03xxx)
  • • International format (+44)
  • • Various spacing and formatting styles

Cleaning Process:

  • 1. Removes all non-digit characters (spaces, hyphens, parentheses)
  • 2. Validates length (10-11 digits for UK numbers)
  • 3. Converts to +44 format
  • 4. Adds spacing for readability (+44 7123 456789)

Examples:

07123456789 → +44 7123 456789 +447123456789 → +44 7123 456789 (020) 7123 4567 → +44 20 7123 4567 0207-123-4567 → +44 20 7123 4567

National Insurance Number Validation

Format:

Two letters, six digits, one letter (e.g., AB123456C)

Cleaning Process:

  • 1. Removes spaces, hyphens, and special characters
  • 2. Converts to uppercase
  • 3. Validates prefix (excludes invalid prefixes like BG, GB, NK, etc.)
  • 4. Formats with spaces (AB 12 34 56 C)

Examples:

ab123456c → AB 12 34 56 C AB-123456-C → AB 12 34 56 C ab 12 34 56 c → AB 12 34 56 C

Postcode Validation

Accepts:

  • • All UK postcode formats
  • • With or without spaces
  • • Mixed case

Cleaning Process:

  • 1. Converts to uppercase
  • 2. Removes extra spaces
  • 3. Adds proper spacing (outward code + space + inward code)
  • 4. Validates format against UK postcode patterns

Examples:

sw1a1aa → SW1A 1AA m11aa → M1 1AA EC1A1BB → EC1A 1BB w1a 1aa → W1A 1AA

Sort Code Validation

Format:

Six digits formatted as XX-XX-XX

Cleaning Process:

  • 1. Removes all non-digit characters
  • 2. Validates length (must be exactly 6 digits)
  • 3. Formats with hyphens (XX-XX-XX)

Examples:

123456 → 12-34-56 12 34 56 → 12-34-56 12-34-56 → 12-34-56

Troubleshooting

File won't upload or processing fails

  • • Check file size - very large files may exceed browser memory
  • • Ensure file has a valid header row
  • • For CSV files, check for consistent delimiters (commas)
  • • For Excel files, ensure data is in the first sheet
  • • Try a different browser (Chrome recommended)

Values not being cleaned as expected

  • • Verify you selected the correct data type (Phone vs. Postcode, etc.)
  • • Check if the column is protected (has lock icon) - protected columns are never modified
  • • Ensure data is in the correct format (UK phone numbers, not international)
  • • Review the validation rules for your data type

Download not starting or file is empty

  • • Check browser popup blocker settings
  • • Ensure you clicked "Clean My Data" before downloading
  • • Try a different download format (CSV if Excel fails)
  • • Clear browser cache and try again

Duplicate detection seems incorrect

  • • Duplicates are detected AFTER cleaning - compare cleaned values, not original
  • • Check the Full Preview to see which rows are marked as duplicates
  • • Blue rows = originals, Yellow rows = duplicates
  • • Duplicates must match in ALL columns, not just some

Preview not showing or loads slowly

  • • Large files (10,000+ rows) may take time to render
  • • Try closing other browser tabs to free memory
  • • Use a modern browser with good JavaScript performance
  • • Consider splitting very large files into smaller chunks

Frequently Asked Questions

Q: Is my data really private?

A: Yes, 100%. All processing happens in your browser using JavaScript. Your file is never uploaded to our servers, never stored in any database, and never transmitted over the network. We physically cannot see your data - it never leaves your device.

Q: What happens to my data after I close the browser?

A: It's gone forever. Since everything happens in your browser's memory and nothing is stored on servers, closing the tab or refreshing the page will permanently delete the cleaned data. Always download your file immediately after processing.

Q: Can I clean non-UK data?

A: Currently, Simple Data Cleaner specialises in UK data formats only (UK phone numbers, NI numbers, postcodes, and sort codes). International data formats are not supported at this time.

Q: Why can't I select certain columns for cleaning?

A: If a column shows a lock icon, it's been automatically detected as a protected column (ID, key, or reference field). Protected columns are never modified to preserve data relationships and prevent breaking system integrations.

Q: What does "Keep First Occurrence" mean for duplicates?

A: When removing duplicates, the system keeps the first row it encounters and removes all subsequent identical rows. This preserves the original order of your data and ensures the "original" record (blue row) is always kept.

Q: Can I undo changes after downloading?

A: No, downloads are final. However, your original file remains unchanged on your computer. Always keep a backup of your original file before processing. You can also use the Full Preview feature to verify results before downloading.

Q: Why are some phone numbers marked as invalid?

A: The validator checks for proper UK phone number format and length. Common issues: incorrect area codes, too few/many digits, or non-UK numbers. Review the validation rules section for specific format requirements.

Q: Can I clean multiple files at once?

A: No, the app processes one file at a time. To clean multiple files, upload and process them individually. You can keep multiple browser tabs open to work on different files simultaneously.

Q: What's the difference between "Export Only Rows With Issues" and normal download?

A: Normal download gives you the complete file with all rows (cleaned values applied). "Export Only Rows With Issues" creates a filtered file containing ONLY the rows that have validation problems - useful for creating a focused review file that you can fix manually and merge back later.

Q: How do I know if cleaning was successful?

A: Check the Results Summary cards showing valid vs. invalid counts, review the Full Preview tab to see green-highlighted cleaned cells, and review the Data Profiling section for overall data quality insights. Green cells = successfully cleaned values.

Still Have Questions?

Can't find what you're looking for? Our support team is here to help.

Contact Support