Documentation
Everything you need to know about using Simple Data Cleaner to clean and validate your UK data.
Getting Started
Simple Data Cleaner helps you validate and clean UK data formats including phone numbers, National Insurance numbers, postcodes, and bank sort codes. The entire process happens in your browser - your data never leaves your device.
Quick Start:
- Upload your CSV, Excel (.xlsx/.xls), or JSON file
- Select which columns contain UK data to clean (or use Auto-Select)
- Click "Clean My Data" to process
- Review results in the Full Preview tab
- Customise export options (format, duplicates, whitespace)
- Download your cleaned file
Pro Tip: Always use the Full Preview feature before downloading to ensure the cleaning results match your expectations.
Interactive Example
Want to try Simple Data Cleaner without uploading your own file? Visit our Interactive Example page to explore the tool with pre-loaded sample data.
What's Included in the Example:
The example dataset includes 23 rows with various UK data formats, including:
- Names with titles: "Mr. John Smith", "Mrs. Jane Doe", "Dr. Bob Johnson" - perfect for testing name splitting
- Phone numbers: Various formats (international, UK mobile, landline) - some valid, some invalid
- Postcodes: Different formats (lowercase, no spaces, with spaces) that will be cleaned to standard UK format
- Sort codes: Various formats (no dashes, with spaces, with dots) that will be cleaned to XX-XX-XX format
- Bank account numbers: 8-digit account numbers that will be validated and formatted
- National Insurance numbers: Valid and invalid examples
- Duplicate rows: Rows 21-23 are duplicates of rows 1-3, perfect for testing duplicate detection
How to Use the Example Page:
- Load Example Data: Click the "Load Example Data" button to automatically load the sample dataset
- Select Columns: Use "Auto-Select" to automatically detect cleanable columns, or manually select which fields to clean
- Choose Phone Format: Select whether you want phone numbers formatted as International (+44) or UK (0) format
- Clean the Data: Click "Clean My Data" to process the example file
- Explore Results:
- View the Summary tab to see statistics
- Check the Full Preview tab to see all cleaned data with green highlighting for modified cells
- Open the Full Preview Report for a detailed HTML view
- View the Cleaned tab to see what was fixed (postcodes, sort codes, names split, etc.)
- Check the Issues tab to see validation errors
- Export Options: Customise your download format (CSV, Excel, JSON) and options (remove duplicates, trim whitespace, export only rows with issues)
- Download: Download the cleaned file to see the results
- Reset Demo: Click "Reset Demo" to clear all data and start over - perfect for trying different cleaning options
Perfect for Learning: The example data is specifically designed to demonstrate all features. You'll see postcodes cleaned from lowercase to uppercase, sort codes formatted from various formats to XX-XX-XX, names split into title/first_name/last_name, and duplicate rows highlighted.
Reset Demo Button: The "Reset Demo" button clears all loaded data and results, allowing you to start fresh and try different cleaning options. This is especially useful for experimenting with different field selections, phone number formats, and export options without having to reload the page.
Supported File Formats
CSV
Comma Separated Values
- • Extension: .csv
- • Universal compatibility
- • Best for spreadsheets
Excel
Microsoft Excel Format
- • Extensions: .xlsx, .xls
- • Native Excel format
- • Preserves formatting
JSON
JavaScript Object Notation
- • Extension: .json
- • Perfect for developers
- • Structured data format
🔄 Format Conversion: Upload in any format, download in any format. For example, upload a CSV file and download as Excel or JSON!
Upload Process
File Size Limits
File size limits depend on your browser and device memory:
- Free users: Files up to several hundred MB
- Premium users: Larger files supported (browser-dependent)
- Processing time increases with file size
File Requirements
- File must have a header row (column names)
- CSV files should be properly formatted with consistent delimiters
- Excel files: Only the first sheet is processed
- JSON files should contain an array of objects with consistent keys
Privacy Guarantee
100% Private: Your file is processed entirely in your browser using JavaScript. No data is transmitted to our servers, stored in any database, or accessed by third parties. We never see your data.
Field Selection
Supported UK Data Types
Phone Numbers
UK mobile and landline numbers
Accepts:
- 07123456789
- +44 7123 456789
- 0207 123 4567
- (020) 7123 4567
National Insurance
UK NI numbers
Accepts:
- AB123456C
- AB 12 34 56 C
- ab-123456-c
Postcodes
UK postal codes
Accepts:
- SW1A 1AA
- sw1a1aa
- M1 1AA
- EC1A 1BB
Sort Codes
UK bank sort codes
Accepts:
- 12-34-56
- 123456
- 12 34 56
Full Names (Split)
Split full names into separate columns
Creates:
- title (Mr, Mrs, Dr, etc.)
- first_name
- last_name
Example:
- "Mr. John Smith" → title: "Mr", first_name: "John", last_name: "Smith"
Auto-Select Feature
Click the "Auto-Select" button to automatically detect and select all cleanable columns based on:
- Column names: Matches keywords like "phone", "mobile", "postcode", "ni_number", "sort_code", "name", "full_name"
- Data content: Analyzes sample values to identify UK data patterns and name formats
- Smart suggestions: Recommends the appropriate data type for each column, including name splitting for full name columns
Manual Selection
You can manually select columns and choose the validation type:
- Check the box next to each column you want to clean
- Select the data type from the dropdown (Phone, NI Number, Postcode, Sort Code)
- For name columns, check the "Split into first_name and last_name" option to split full names
- The system will validate and clean based on your selection
Name Splitting: When you select a name column for splitting, it creates three new columns: title (extracted titles like Mr, Mrs, Dr), first_name, and last_name. The original name column is preserved unless you choose to remove it during export.
Protected Columns
Protected columns are automatically detected and never modified during the cleaning process. This ensures your data relationships and unique identifiers remain intact.
What Gets Protected?
Columns matching these patterns are automatically protected:
id
*_id
*_key
pk
*_number
reference
Examples: customer_id, order_number, transaction_key, account_id, reference_code
Visual Indicators
- Protected columns show a lock icon in the field selection area
- Lock icon appears in the Full Preview table headers
- Listed in the download summary under "Protected Columns"
Why Protected? ID and key columns often contain unique identifiers, order numbers, or reference codes that have specific meaning in your systems. Modifying these could break data relationships or cause integration issues.
Data Profiling
After processing your file, you'll see a Data Profiling section with key insights about your data quality:
Missing Values
Count of empty or null cells across all columns
Duplicate Rows
Count of exact duplicate rows found in your data
Unique Rows
Count of unique rows (total minus duplicates)
Missing Values by Column
If your file has missing values, you'll see a detailed breakdown table showing:
- Which columns have missing data
- Count of missing values per column
- Percentage of rows affected
This helps you identify data quality issues and decide if you need to fill in missing values before using the data.
Full Preview Feature
The Full Preview tab shows you exactly what your downloaded file will contain. This gives you complete confidence before downloading.
What You'll See:
- All rows and columns from your original file
- Cleaned values applied to validated fields
- Colour coding to highlight changes and issues
- Protected columns marked with lock icons
- Row and column counts at the top
Preview Benefits: The Full Preview eliminates surprises. You can verify the cleaning results, check for any issues, and ensure the output matches your expectations before committing to a download.
Full Preview HTML Report
In addition to viewing the preview in the tab, you can open a dedicated HTML report page for a better viewing experience:
- After processing your file, go to the "Full Preview" tab
- Click the "View Full Preview Report" button
- A comprehensive HTML report opens in a new page with the complete cleaned dataset
- The report includes highlighting for duplicate rows and modified cells
- You can print or bookmark this report page for easy reference
Colour Coding System
The Full Preview uses a colour coding system to help you quickly identify different types of data and changes:
Blue Rows
Original rows that have duplicates elsewhere in the file
Meaning: This is the first occurrence - it will be kept even if "Remove duplicates" is checked
Yellow Rows
Duplicate rows (exact copies of earlier rows)
Meaning: This row will be removed if you check "Remove duplicate rows"
Green Cells
Individual cells that were cleaned, validated, or fixed
Meaning: This specific value was modified by the validation process (e.g., phone number formatted to +44 format)
White/Gray Rows
Normal rows with no duplicates (alternating white and light gray for readability)
Meaning: Standard row with no special status
Example Preview:
Duplicate Row Handling
How Duplicates Are Detected
The system identifies exact duplicate rows by comparing all column values:
- Two rows are duplicates if all column values match exactly
- The first occurrence is considered the "original"
- Subsequent identical rows are marked as "duplicates"
- Case-sensitive comparison (unless cleaned values are used)
Removal Options
In the Export Options section, you'll find a checkbox: "Remove Duplicate Rows (Keep First Occurrence)"
✓ Checked (Remove)
- • Only first occurrence kept
- • Duplicate rows excluded from download
- • Row count reduced
- • Data relationships preserved
○ Unchecked (Keep All)
- • All rows included in download
- • Both originals and duplicates kept
- • Original row count maintained
- • Useful for audit purposes
Visual Preview
Before removing duplicates, always check the Full Preview tab to verify:
- Blue rows show which records will be kept (originals)
- Yellow rows show which records will be removed (duplicates)
- Duplicate count shown in the Export Options section
- Total affected rows count displayed (originals + duplicates)
Important: Duplicate removal is applied AFTER data cleaning. This means the system will detect duplicates based on the cleaned, normalized values - not the original raw data.
Export Options
Download Format Selection
Choose your preferred download format:
CSV
- • Universal compatibility
- • Opens in any spreadsheet
- • Smallest file size
Excel (.xlsx)
- • Native Excel format
- • Ready to use in Excel
- • Preserves data types
JSON
- • Developer-friendly
- • Structured format
- • Easy to parse
Data Cleaning Options
Remove Duplicate Rows
Automatically removes exact duplicate rows, keeping only the first occurrence. Duplicates are highlighted in the preview so you can review before removing.
Clean Whitespace in All Cells
Trims leading/trailing spaces and normalizes multiple spaces to single spaces across all cells. This ensures consistent formatting. Enabled by default.
Additional Export Options
Include "Issues" Column
Adds an extra column listing any validation issues found in each row. Helps you quickly identify which rows may need manual review.
Export Only Rows With Issues
Downloads ONLY the rows that have validation problems. Perfect for creating a focused review file that you can fix manually and merge back into your main dataset.
Detailed Issues Report
Get comprehensive, actionable explanations for every validation issue. No more guessing what's wrong with your data - we tell you exactly what the problem is and how to fix it.
How It Works
- After processing your file, go to the "Issues" tab
- Click the "View Issues Report" button
- A comprehensive HTML report opens in a new page with detailed explanations for each issue
- Each issue shows: the invalid value, specific problem, explanation, and actionable guidance
Report Features:
- Dedicated HTML Page: Opens in a separate page for better viewing and printing
- Persistent Storage: Reports survive page refreshes. Your browser's localStorage keeps the report available even after refreshing.
- Print & Download: Use the print button to print the report, or download it as an HTML file for permanent storage and sharing.
- Navigation Sidebar: Quick links to jump to specific data type sections (NI Numbers, Phone Numbers, Postcodes, etc.)
- Detailed Explanations: Each issue includes specific problem identification, why it's invalid, and actionable steps to fix it
What You Get
- • Specific problem identification
- • Clear explanations of why values are invalid
- • Actionable guidance on how to fix issues
- • Professional HTML report format
Example Explanations
- • NI Numbers: Which letter is invalid and why
- • Phone Numbers: Format issues and corrections
- • Postcodes: Missing spaces or wrong format
- • Sort Codes: Wrong digit count or letters
Pro Tip: The detailed issues report is perfect for sharing with your team or documenting data quality issues. Each explanation includes the specific problem, why it's invalid according to UK standards, and clear steps to fix it.
📌 Important: The report is stored in your browser's session storage, so if you refresh the page, it will automatically restore. You can also print the report directly from the browser or download it as an HTML file for permanent storage.
Cleaned Data Report
View a comprehensive report of all values that were automatically cleaned during processing. See the original values alongside their cleaned versions, organised by data type.
How It Works
- After processing your file, go to the "Cleaned" tab
- Click the "View Cleaned Report" button
- A comprehensive HTML report opens in a new page showing all cleaned values
- Values are organised by type (Phone Numbers, Postcodes, NI Numbers, Sort Codes, Name Splits, etc.)
- Each entry shows the original value and the cleaned/fixed value side by side
Report Features:
- Dedicated HTML Page: Opens in a separate page for better viewing and printing
- Organised by Type: Values are grouped by data type (Phone Numbers, Postcodes, etc.) for easy navigation
- Before & After: See original values alongside their cleaned versions
- Navigation Sidebar: Quick links to jump to specific data type sections
- Persistent Storage: Reports survive page refreshes via browser localStorage
- Print & Share: Perfect for documenting your data cleaning process and sharing with your team
Pro Tip: The cleaned data report is excellent for auditing your data cleaning process, tracking what changes were made, and documenting your data quality improvements for stakeholders.
Important: Your cleaned data only exists in your browser session. Once you close the tab or refresh the page, it's gone. Make sure to download and save your file immediately after processing!
Validation Rules
Phone Number Validation
Accepts:
- • UK mobile numbers (07xxx xxxxxx)
- • UK landline numbers (01xxx / 02xxx / 03xxx)
- • International format (+44)
- • Various spacing and formatting styles
Cleaning Process:
- 1. Removes all non-digit characters (spaces, hyphens, parentheses)
- 2. Validates length (10-11 digits for UK numbers)
- 3. Converts to +44 format
- 4. Adds spacing for readability (+44 7123 456789)
Examples:
National Insurance Number Validation
Format:
Two letters, six digits, one letter (e.g., AB123456C)
Cleaning Process:
- 1. Removes spaces, hyphens, and special characters
- 2. Converts to uppercase
- 3. Validates prefix (excludes invalid prefixes like BG, GB, NK, etc.)
- 4. Formats with spaces (AB 12 34 56 C)
Examples:
Postcode Validation
Accepts:
- • All UK postcode formats
- • With or without spaces
- • Mixed case
Cleaning Process:
- 1. Converts to uppercase
- 2. Removes extra spaces
- 3. Adds proper spacing (outward code + space + inward code)
- 4. Validates format against UK postcode patterns
Examples:
Sort Code Validation
Format:
Six digits formatted as XX-XX-XX
Cleaning Process:
- 1. Removes all non-digit characters
- 2. Validates length (must be exactly 6 digits)
- 3. Formats with hyphens (XX-XX-XX)
Examples:
Troubleshooting
File won't upload or processing fails
- • Check file size - very large files may exceed browser memory
- • Ensure file has a valid header row
- • For CSV files, check for consistent delimiters (commas)
- • For Excel files, ensure data is in the first sheet
- • Try a different browser (Chrome recommended)
Values not being cleaned as expected
- • Verify you selected the correct data type (Phone vs. Postcode, etc.)
- • Check if the column is protected (has lock icon) - protected columns are never modified
- • Ensure data is in the correct format (UK phone numbers, not international)
- • Review the validation rules for your data type
Download not starting or file is empty
- • Check browser popup blocker settings
- • Ensure you clicked "Clean My Data" before downloading
- • Try a different download format (CSV if Excel fails)
- • Clear browser cache and try again
Duplicate detection seems incorrect
- • Duplicates are detected AFTER cleaning - compare cleaned values, not original
- • Check the Full Preview to see which rows are marked as duplicates
- • Blue rows = originals, Yellow rows = duplicates
- • Duplicates must match in ALL columns, not just some
Preview not showing or loads slowly
- • Large files (10,000+ rows) may take time to render
- • Try closing other browser tabs to free memory
- • Use a modern browser with good JavaScript performance
- • Consider splitting very large files into smaller chunks
Frequently Asked Questions
Q: Is my data really private?
A: Yes, 100%. All processing happens in your browser using JavaScript. Your file is never uploaded to our servers, never stored in any database, and never transmitted over the network. We physically cannot see your data - it never leaves your device.
Q: What happens to my data after I close the browser?
A: It's gone forever. Since everything happens in your browser's memory and nothing is stored on servers, closing the tab or refreshing the page will permanently delete the cleaned data. Always download your file immediately after processing.
Q: Can I clean non-UK data?
A: Currently, Simple Data Cleaner specialises in UK data formats only (UK phone numbers, NI numbers, postcodes, and sort codes). International data formats are not supported at this time.
Q: Why can't I select certain columns for cleaning?
A: If a column shows a lock icon, it's been automatically detected as a protected column (ID, key, or reference field). Protected columns are never modified to preserve data relationships and prevent breaking system integrations.
Q: What does "Keep First Occurrence" mean for duplicates?
A: When removing duplicates, the system keeps the first row it encounters and removes all subsequent identical rows. This preserves the original order of your data and ensures the "original" record (blue row) is always kept.
Q: Can I undo changes after downloading?
A: No, downloads are final. However, your original file remains unchanged on your computer. Always keep a backup of your original file before processing. You can also use the Full Preview feature to verify results before downloading.
Q: Why are some phone numbers marked as invalid?
A: The validator checks for proper UK phone number format and length. Common issues: incorrect area codes, too few/many digits, or non-UK numbers. Review the validation rules section for specific format requirements.
Q: Can I clean multiple files at once?
A: No, the app processes one file at a time. To clean multiple files, upload and process them individually. You can keep multiple browser tabs open to work on different files simultaneously.
Q: What's the difference between "Export Only Rows With Issues" and normal download?
A: Normal download gives you the complete file with all rows (cleaned values applied). "Export Only Rows With Issues" creates a filtered file containing ONLY the rows that have validation problems - useful for creating a focused review file that you can fix manually and merge back later.
Q: How do I know if cleaning was successful?
A: Check the Results Summary cards showing valid vs. invalid counts, review the Full Preview tab to see green-highlighted cleaned cells, and review the Data Profiling section for overall data quality insights. Green cells = successfully cleaned values.
Still Have Questions?
Can't find what you're looking for? Our support team is here to help.
Contact Support