Does the Remove Duplicates tool delete the original data permanently?

Yes, the Remove Duplicates tool permanently deletes duplicate rows from your sheet. It does not create a backup or preserve deleted rows. To be safe, make a copy of your sheet before using this tool, or use the UNIQUE function instead if you want to preserve your original data while viewing the cleaned version.

Can I remove duplicates based on only one column when I have multiple columns?

Yes. In the Remove Duplicates dialog, uncheck all columns except the one you want to check. The tool will then remove rows where that column value matches another row, even if other columns differ. For formulas, use COUNTIF with just the single column you care about.

What's the difference between UNIQUE and the Remove Duplicates tool?

The Remove Duplicates tool modifies your existing data permanently by deleting duplicate rows. UNIQUE creates a new filtered list of unique values without touching your original data. Choose Remove Duplicates for one-time cleanup, and UNIQUE when you want to preserve your original data and work with a clean copy.

Can I undo the Remove Duplicates action if I make a mistake?

Yes, use Ctrl+Z (or Cmd+Z on Mac) to undo the Remove Duplicates action immediately after running it. However, if you close your sheet and reopen it, the undo history is lost and the deletion becomes permanent. Always make a backup before removing duplicates from important data.

How do I remove duplicates across multiple Google Sheets files?

Copy all data from each sheet into one master sheet, then apply your duplicate removal method. Alternatively, use IMPORTRANGE formulas to pull data from multiple sheets into one location, then clean it there. Another approach is to process each sheet individually using the Remove Duplicates tool or UNIQUE function.

How to Remove Duplicates in Google Sheets • Visual Media Factory

How to Remove Duplicates in Google Sheets

By 572, Uncategorized 0 Comments

Why Duplicates Are a Problem in Data

If you work with spreadsheets regularly, you’ve probably encountered duplicate entries at some point. They slip into your data in countless ways: copy-paste accidents, multiple uploads of the same file, manual data entry errors, or system glitches that create unwanted duplicates. Whether you’re managing a customer list, tracking inventory, or analyzing survey responses, duplicates skew your results and waste time sorting through bad data.

Google Sheets makes it surprisingly easy to clean up your data and remove these unwanted duplicates. You have several methods to choose from, each suited to different scenarios. Some approaches work best for a quick cleanup, while others let you inspect duplicates first before deciding what to keep. Understanding these options helps you pick the right tool for your situation.

This guide walks you through every method for removing duplicates in Google Sheets, from the built-in cleanup tool to formulas that give you fine control over which entries survive the process. By the end, you’ll know exactly how to handle duplicates in your sheets, whether you’re working with small lists or massive datasets with thousands of rows.

The Remove Duplicates Tool: The Fastest Way

Google Sheets includes a dedicated data cleanup feature that removes duplicates in seconds. It’s the simplest approach for most people, and it’s built right into the menu.

Start by selecting all the data you want to check. Click the header row to include column names in your selection, then drag down to your last row of data. If you have a huge dataset, click on a cell in your data range, then use the keyboard shortcut Ctrl+A (or Cmd+A on Mac) to select all your data at once.

Once your data is selected, navigate to the Data menu at the top of the sheet. You’ll see an option labeled “Data cleanup” with a small arrow next to it. Click on that arrow to expand the submenu. Inside, you’ll find “Remove duplicates” as your first option.

A dialog box appears asking which columns to check for duplicates. By default, all columns are selected. This means Google Sheets will consider a row a duplicate only if every single column matches another row. If you want to check duplicates based on specific columns instead, uncheck the ones you don’t need. For example, if you’re checking a customer list and only want to remove duplicates based on email address (ignoring different phone numbers), you’d uncheck the phone column.

After you’ve selected your columns, click the “Remove duplicates” button at the bottom of the dialog. Google Sheets instantly scans your entire range, identifies matching rows, and deletes the duplicates. It keeps the first occurrence of each unique row and removes all subsequent ones. You’ll see a summary message showing how many duplicates were found and removed.

This method is unbeatable for speed and simplicity. It requires no formulas and takes just a few clicks. The trade-off is that you have limited control over which duplicate it keeps. If you need to inspect duplicates before deleting them or keep the last occurrence instead of the first, you’ll want to try one of the other methods below.

The UNIQUE Function: Clean Data Without Deleting Anything

The UNIQUE function is one of Google Sheets’ most powerful tools for handling duplicates. Instead of deleting rows, it creates a filtered list of unique values in a new location. This means your original data stays intact while you work with a clean copy.

The syntax is straightforward: =UNIQUE(range). That’s it in its simplest form. You can also add optional arguments to fine-tune the behavior.

Here’s a practical example. Suppose you have a list of customer names in cells A2 through A100. You want to see only the unique names without deleting any of your original data. Click on an empty cell, maybe D2, and type =UNIQUE(A2:A100). Press Enter, and Google Sheets instantly populates column D with every unique name from your original list, removing all duplicates.

The UNIQUE function becomes even more useful when working with multiple columns. If your data spans columns A through C (name, email, phone), you can use =UNIQUE(A2:C100) to extract rows where the combination of all three columns is unique. This works across entire datasets without touching your original information.

One advanced feature is the by_column argument. By default, UNIQUE checks for duplicate rows. Setting this to TRUE makes it check for duplicate columns instead: =UNIQUE(A2:C100, TRUE). This is useful for datasets where your structure is sideways rather than traditional rows and columns.

Another powerful option is the occurs_only argument. When set to TRUE, UNIQUE returns only values that appear more than once in your range. This is perfect if you want to focus specifically on problem duplicates: =UNIQUE(A2:A100, FALSE, TRUE). Note the FALSE in the middle maintains the default row-checking behavior.

The beauty of UNIQUE is flexibility. You can experiment with the formula in a helper column without risk. If you don’t like the results, just delete the column and try again. Once you’re satisfied with your cleaned data, you can copy the unique results and paste them back over your original data as values, completely replacing the duplicates with clean information.

Highlighting Duplicates with Conditional Formatting

Sometimes you don’t want to remove duplicates at all. Maybe you need to review them first, understand why they exist, or keep them for reporting purposes. Highlighting duplicates lets you spot them visually without deleting anything.

Start by selecting your data range. Open the Format menu and click “Conditional formatting.” A panel appears on the right side of your screen with formatting rule options.

Under “Format rules,” choose “Custom formula is” from the dropdown. This lets you write a formula that identifies duplicates. The formula you need is =COUNTIF($A$2:$A$100,A2)>1. This checks if each value in your range appears more than once. If it does, the cell gets highlighted.

To make this work, remember to use absolute references for the entire range (the dollar signs) while keeping the individual cell reference relative (no dollar signs). This way, the formula adapts as it checks each row.

For multi-column duplicates, the approach is slightly different. If you’re checking whether an entire row is duplicated based on columns A through C, you’d use =COUNTIFS($A$2:$A$100,A2,$B$2:$B$100,B2,$C$2:$C$100,C2)>1. This formula counts how many rows match the current row across all three columns.

Once your formula is entered, choose your formatting. You can change the fill color to red, yellow, or whatever stands out to you. Text color and bold formatting work too. Click “Done,” and Google Sheets highlights every duplicate cell in your selected range.

This visual approach is perfect for smaller datasets where you want human review. You can see exactly which entries are duplicated, understand the context, and decide manually whether to keep or remove them. For very large sheets, filtering after highlighting (which you’ll learn about next) helps you focus on just the problematic rows.

Flagging Duplicates with COUNTIF and Filtering

This method combines the COUNTIF formula with filtering to give you precise control. It works by adding a helper column that counts how many times each value appears in your dataset.

Let’s say your data is in columns A through C, with headers in row 1 and data starting in row 2. Click on cell D1 and add a header like “Duplicate Count.” Then, in cell D2, enter the formula =COUNTIF($A$2:$A$100,A2). This counts how many times the value in A2 appears in the entire range A2:A100.

Press Enter, then copy this formula down to all your data rows. Now column D shows a number for each row: 1 means that entry is unique, 2 means it appears twice, 3 means it appears three times, and so on.

Next, apply a filter to your data. Select your entire data range including headers, then click Data in the menu and choose “Create a filter.” Filter buttons appear in your header row. Click the filter button in your “Duplicate Count” column, uncheck “1,” and click OK. Now you see only the rows that are actually duplicated.

From here, you can manually review these duplicates and delete the rows you don’t want. This approach is slower than the automatic Remove Duplicates tool, but it gives you complete visibility and control. You might discover that some “duplicates” are actually legitimate entries with the same name but different contact information, and you’ll want to keep both.

After you’ve removed the duplicates you want to delete, you can remove the filter to see your cleaned dataset. Then delete the helper column D if you don’t need it anymore. Your data is now clean and verified by human review.

Single Column vs. Multiple Column Duplicates

Understanding the difference between checking a single column for duplicates and checking multiple columns is important because it changes what Google Sheets considers “duplicate.”

A single-column duplicate check looks for matching values in just one column. If you’re checking names in column A, it finds every instance where two rows have the same name, regardless of whether the email, phone, or other information matches. This is useful when you know a particular column should have unique values, like customer IDs or email addresses.

A multi-column duplicate check is stricter. It only flags rows as duplicates if they match across all selected columns. If you’re checking names and emails together, a row is considered a duplicate only if another row has the exact same name AND the exact same email. This is useful for customer lists where the same person might be entered with slightly different information, and you want to keep both versions.

The Remove Duplicates tool checks multiple columns by default (all of them), but you can uncheck specific columns in the dialog to switch to single-column mode. The UNIQUE function works across all columns in your selected range unless you specify otherwise. The COUNTIF approach requires you to decide upfront whether you’re counting one column or multiple columns in your formula.

Most of the time, you’ll want to check multiple columns together. A customer with the same name but a different email address is probably not a true duplicate. But if you’re tracking product SKUs or invoice numbers, checking a single column makes sense because those IDs should be completely unique.

Keeping the First vs. Last Occurrence

By default, Google Sheets keeps the first occurrence of each duplicate and removes subsequent ones. But what if you want to keep the last occurrence instead? This matters when your data is chronologically ordered and the most recent entry is the one you trust.

The Remove Duplicates tool always keeps the first occurrence, so if you need the last one, you’ll need a different approach. Sort your data in reverse order (newest first), then use the Remove Duplicates tool. This makes the newest entry the “first” one for the purposes of the tool, and it gets kept while older duplicates are removed.

Another approach uses a helper column. Add a column with row numbers using =ROW(), then sort by your key column (ascending) and by row number (descending). Now the last occurrence of each duplicate has the highest row number. Use COUNTIF to identify duplicates and manually delete all rows except those with the highest row numbers.

The UNIQUE function doesn’t have a built-in way to keep last occurrences either, but you can work around it. Copy your data in reverse order to a new sheet, apply UNIQUE to remove duplicates (keeping the first occurrence of the reversed data), then reverse the results back again.

For simpler datasets, sometimes the easiest approach is to manually sort, review the duplicates, and delete the ones you don’t want to keep. When the goal is to keep the most recent or most accurate version of duplicated entries, human judgment often beats automated rules anyway.

Using Data Validation to Prevent Duplicates

The best duplicate is one that never enters your data in the first place. Google Sheets allows you to set up data validation rules that prevent duplicate entries from being added to your sheet.

Select the range where you want to enforce unique values. Open the Data menu and click “Data validation.” A dialog box opens with validation options.

Under “Criteria,” choose “Custom formula is” and enter =COUNTIF($A$2:$A,A2)=1. This formula ensures that each value in your range appears only once. If someone tries to enter a value that already exists, the validation rule rejects it.

Set your error message to something helpful like “This value already exists. Please enter a different one.” Now, whenever someone tries to add a duplicate entry, they’ll see your custom message and the cell won’t accept the input.

This approach works best for smaller reference lists or specific columns where uniqueness is critical. For large datasets or optional uniqueness, it might be too restrictive. But for important fields like email addresses in a signup sheet or product SKUs in an inventory list, data validation is worth the setup time.

Handling Duplicates in Large Datasets

When you’re working with sheets containing tens of thousands of rows, performance becomes a concern. Formulas like UNIQUE or COUNTIF might slow down as Google Sheets processes more and more data.

For massive datasets, the Remove Duplicates tool usually performs best because it’s optimized by Google. It processes large ranges faster than formula-based approaches. If you’re working with 50,000+ rows, go straight to Data > Data cleanup > Remove duplicates rather than experimenting with formulas.

If you must use formulas on large datasets, consider working with smaller chunks. Break your data into sections of 5,000 to 10,000 rows, process each section separately, then combine the results. This approach takes longer overall but avoids the performance slowdown that comes with processing massive ranges in a single formula.

Another option is to use filters to show only a subset of your data, process that subset, then move to the next chunk. This keeps your active range manageable and your sheet responsive.

Also remember that every formula in a large sheet recalculates whenever you make a change. If you’ve used UNIQUE or COUNTIF to flag duplicates, consider converting those formulas to static values once your analysis is done. Copy the cells, paste them as values using Paste special, then delete the formulas. This dramatically speeds up your sheet and prevents accidental formula changes.

Frequently Asked Questions

Removing duplicates from your Google Sheets is one of the most important data cleaning skills you can master. Whether you use the built-in tool for speed, formulas like UNIQUE for flexibility, or conditional formatting for visibility, you now have the knowledge to handle duplicates confidently. Start with the method that matches your situation, and remember that the best duplicate is one you prevent from entering your data in the first place. For more Google Sheets tips, check out our guides on wrap text in Google Sheets, sort by date in Google Sheets, and conditional formatting in Google Sheets.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

How to Remove Duplicates in Google Sheets

How to Remove Duplicates in Google Sheets

Why Duplicates Are a Problem in Data

The Remove Duplicates Tool: The Fastest Way

The UNIQUE Function: Clean Data Without Deleting Anything

Highlighting Duplicates with Conditional Formatting

Flagging Duplicates with COUNTIF and Filtering

Single Column vs. Multiple Column Duplicates

Keeping the First vs. Last Occurrence

Using Data Validation to Prevent Duplicates

Handling Duplicates in Large Datasets

Frequently Asked Questions

Author

Leave a Reply Cancel reply

How to Remove Duplicates in Google Sheets

How to Remove Duplicates in Google Sheets

Why Duplicates Are a Problem in Data

The Remove Duplicates Tool: The Fastest Way

The UNIQUE Function: Clean Data Without Deleting Anything

Highlighting Duplicates with Conditional Formatting

Flagging Duplicates with COUNTIF and Filtering

Single Column vs. Multiple Column Duplicates

Keeping the First vs. Last Occurrence

Using Data Validation to Prevent Duplicates

Handling Duplicates in Large Datasets

Frequently Asked Questions

Share this post

Author

Leave a Reply Cancel reply

Related Posts