Security

Your Data Cleaning Tool Sees Everything

Google Sheets, Flatfile, OpenRefine, Excel Online — every popular data cleaning tool reads your data in plaintext. Here is what that means for your security, your compliance, and your customers.

March 2026·10 min read

The Uncomfortable Reality

Every time you paste data into Google Sheets, upload a CSV to Flatfile, or run OpenRefine on a customer export, those tools see every row, every cell, every Social Security number. This is not a bug. It is how they are designed. Traditional data tools must read your data in plaintext to process it. There is no way around this limitation in their architecture, and most of them do not even acknowledge it as a problem.

Most teams do not think twice about this. You have a messy CSV that needs cleaning before a campaign launch. You open it in Google Sheets, run some formulas, maybe paste it into a third-party cleaning tool for deduplication, and move on. The entire process takes twenty minutes and feels completely routine. But during those twenty minutes, your customers' personal information — names, phone numbers, email addresses, and potentially much more sensitive data — has been read, processed, cached, and stored by one or more third-party vendors in plaintext.

What Each Tool Actually Does With Your Data

Google Sheets

When you paste data into Google Sheets, it is stored on Google's servers in a format that Google can read. Google employees with administrative access can view the contents of any Google Sheet. Your data is subject to government subpoenas — Google complies with law enforcement requests and publishes a transparency report showing they fulfill thousands of these requests every year. Google Sheets is explicitly not HIPAA compliant for protected health information. Google does not offer a Business Associate Agreement (BAA) for Google Sheets, which means using it for PHI is a direct HIPAA violation. If you are cleaning patient data in Google Sheets, you are already out of compliance.

Flatfile

Flatfile is designed specifically for data import and cleaning. When you upload a file to Flatfile, your data passes through their servers in plaintext. Their platform parses every field to apply validation rules and transformations. This means Flatfile's infrastructure has full access to your data during processing. They offer SOC 2 compliance, which governs their internal controls, but SOC 2 does not change the fundamental fact that your data exists in readable form on their servers. If Flatfile is breached, your customers' data is exposed.

OpenRefine

OpenRefine runs locally on your machine, which is better than cloud tools from a data exposure standpoint. But "local" does not mean "secure." OpenRefine provides no encryption at rest. Your data sits in a local project directory in plaintext. There is no audit trail showing who accessed the data or what changes were made. If your laptop is stolen, lost, or compromised by malware, every dataset you have ever opened in OpenRefine is accessible. And OpenRefine offers no PII detection, no automated compliance features, and no way to clean data without exposing it.

Microsoft Excel Online

Excel Online stores your data on Microsoft's servers. Like Google, Microsoft can access the contents of files stored in OneDrive and SharePoint. Microsoft does offer a BAA for Microsoft 365 E5 customers, which makes it technically usable for HIPAA-regulated data — but only if you have the enterprise license, have signed the BAA, and have configured all the required security controls. Most teams using Excel Online for data cleaning have not done any of this. They are pasting patient data into a spreadsheet on Microsoft's servers without a BAA in place.

The Compliance Problem

If your vendor can see protected health information or personally identifiable information in plaintext, they become a data processor under GDPR and a business associate under HIPAA. This is not optional. It is a legal consequence of the data flow. Every tool that reads your data in plaintext during processing becomes part of your compliance surface area.

Under HIPAA, you need a signed BAA with every business associate that handles PHI. Under GDPR, you need a Data Processing Agreement with every processor. Each of these agreements creates ongoing obligations: breach notification timelines, data retention limits, audit rights, and more. Every additional vendor that touches your data in plaintext multiplies your compliance burden. Review our guide to auditing PII in your spreadsheets to understand where your sensitive data is flowing today.

The Breach Problem

Every vendor that holds your data in plaintext is a breach risk. Not a theoretical risk — a practical, actuarial risk that insurance companies price into your cyber liability premiums. The average cost of a data breach in 2025 was $4.88 million according to IBM's annual study. If your data cleaning vendor gets breached and your customers' SSNs are exposed, you share liability for that breach because you chose to send unencrypted PII to a third party.

You cannot control the security posture of your vendors. You cannot audit their patch management, their employee access controls, their incident response procedures, or their backup encryption practices. You are trusting that they are doing everything right, all the time, forever. History has shown repeatedly that this trust is misplaced. Major SaaS vendors, cloud providers, and enterprise software companies have all suffered breaches that exposed customer data.

The Trust Chain Problem

Your customers trusted you with their data. They did not agree to have their data shared with Google, or Flatfile, or any other vendor in your technology stack. When you paste customer data into Google Sheets, you are extending the trust chain without your customers' knowledge or consent. You are asking them to implicitly trust every tool you use, every vendor you pay, every sub-processor in the chain.

This is not a hypothetical concern. GDPR requires you to disclose sub-processors. CCPA gives consumers the right to know which third parties have received their data. If a customer asks "who has seen my data?" you need to be able to answer that question honestly. If the answer includes your data cleaning tool, your spreadsheet provider, and their respective cloud infrastructure providers, the chain is longer than most customers would expect or accept.

How NoSheet Breaks the Chain

NoSheet is architecturally different from every tool listed above. We do not read your data in plaintext. We cannot read your data in plaintext. This is not a policy — it is a technical constraint enforced by cryptography.

When you import data into NoSheet, PII columns are encrypted at the cell level with per-tenant keys before any cleaning operations run. Deduplication, formatting, validation, and standardization all operate on encrypted data. NoSheet is never a data processor of your plaintext PII, because we never possess your plaintext PII. This fundamentally changes your compliance posture. For a detailed explanation of the underlying architecture, read our guide on zero-knowledge data cleaning.

Side-by-Side Comparison

FeatureGoogle SheetsFlatfileOpenRefineNoSheet
Vendor sees plaintext dataYesYesLocal onlyNever
Cell-level encryptionNoNoNoYes
HIPAA compliantNoWith BAANoYes
Encrypted during processingNoNoNoYes
PII auto-detectionNoLimitedNoYes
Audit trailVersion historyYesNoYes
Government subpoena exposes dataYesYesN/A (local)No (encrypted)

What You Should Do Next

Start by auditing where your sensitive data actually goes during cleaning. Make a list of every tool in your workflow that touches customer PII. For each tool, ask: "Can this vendor read my data in plaintext?" If the answer is yes, that vendor is part of your attack surface and your compliance burden. If you are not sure where PII lives in your spreadsheets, our guide to Google Sheets alternatives for data cleaning walks through the specific risks and how to mitigate them.

Then evaluate whether you actually need to expose that data. In most cases, the answer is no. Formatting phone numbers does not require a vendor to read the phone numbers. Deduplicating email addresses does not require a vendor to see the email addresses. Standardizing dates does not require a vendor to access the dates. All of these operations can be performed on encrypted data, and NoSheet proves it every day for thousands of customers.

The tools you use to clean data should not be the reason your data is at risk. Cleaning should make your data better, not less secure. That is the principle NoSheet is built on, and it is why we engineered an architecture where we never see your data — not because we promise not to look, but because it is cryptographically impossible for us to look.

Your Data Cleaning Tool Should Not Be a Liability

NoSheet cleans your data without ever seeing it. No plaintext exposure. No compliance risk. No vendor trust problem.

Switch to Secure Cleaning