identify duplicate data in worksheet

  • 28 June 2024
  • 2 replies



I need some help with logic to identify data that has duplicate IDs. First spreadsheet has various data and I need to extract the data that has the same name with the same ID and save it in another spreadsheet. I am having trouble visualizing this loop to extract the duplicate data.

2 replies

Userlevel 1
Badge +3

Hi @MarFeli,


You can achieve this using VB Script. Refer below script which requires source excel and destination excel file paths as input. It works for ID and Name cols, where ID placed in first column and Name placed in second column if you want to modify you can do it based on the script comments.


Input with duplicate rows for ID and Name:


 VB Script code: Create a list variable in AA and add source file path in index ‘0’ and destination file path in index ‘1’. Pass this list variable as input parameter for script.

Function CopyDuplicateRows(param)

Option Explicit

Dim xlApp, xlBook, xlSheet, xlCopyBook, xlCopySheet, SourceFile, DestFile
Dim lastRow, i, j, k
Dim dict

SourceFile = param(0) ' your main file path with duplicates

DestFile = param(1) ' Replace with your copy/output file path

' Create an Excel Application object
Set xlApp = CreateObject("Excel.Application")
xlApp.Visible = False ' Set to True if you want Excel to be visible

' Open the main Excel workbook
Set xlBook = xlApp.Workbooks.Open(SourceFile)
Set xlSheet = xlBook.Sheets("Sheet1") ' Replace "Sheet1" with your sheet name

' Open the workbook where duplicates will be copied
Set xlCopyBook = xlApp.Workbooks.Open(DestFile)

Set xlCopySheet = xlCopyBook.Sheets("Sheet1") ' Replace "Sheet1" with your sheet name in the copy workbook

' Get the last row number with data in column A (assuming Name is in column A and ID is in column B) in main workbook
lastRow = xlSheet.Cells(xlSheet.Rows.Count, "A").End(-4162).Row ' -4162 is xlUp constant in VBA

' Create a dictionary object to store concatenated values
Set dict = CreateObject("Scripting.Dictionary")

' Loop through each row in the specified range in main workbook
For i = 2 To lastRow ' Assuming data starts from row 2 (adjust if needed)
    ' Concatenate Name and ID from main workbook
    Dim concatenated
    concatenated = xlSheet.Cells(i, 1).Value & "-" & xlSheet.Cells(i, 2).Value ' Adjust column indexes if needed for Name and ID
    ' Check if concatenated value already exists in dictionary
    If dict.Exists(concatenated) Then
        ' Highlight current cell and corresponding row in main workbook
        xlSheet.Cells(i, 1).Interior.Color = RGB(255, 0, 0) ' Red color
        xlSheet.Cells(i, 2).Interior.Color = RGB(255, 0, 0) ' Red color
        xlSheet.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow color (for highlighting entire row)
        ' Copy the entire row to the copy workbook
        Dim copyLastRow
        copyLastRow = xlCopySheet.Cells(xlCopySheet.Rows.Count, "A").End(-4162).Row + 1 ' Find the last row in the copy workbook
        xlSheet.Rows(i).Copy xlCopySheet.Rows(copyLastRow) ' Copy entire row to the copy workbook
        ' Add concatenated value to dictionary
        dict.Add concatenated, 1
    End If

' Save and close both workbooks

' Clean up
Set dict = Nothing
Set xlSheet = Nothing
Set xlBook = Nothing
Set xlCopySheet = Nothing
Set xlCopyBook = Nothing
Set xlApp = Nothing

End Function


Output Excel contains below rows:


Please give it a try and let me know if you have any queries.



Hemantha Pindra

Userlevel 3
Badge +6

While there isn’t an Excel Advanced package that does this for you, there is a Data Table: Remove duplicate rows. That might be useful for a Action only process to do find the non duplicate rows in memory, then use that in a loop to delete the rows that are in the Spreadsheet table.

In the end, the VB script that @HemanthaPindra provided is a similar algorithm, you could also use it as a reference to build that in with Packages and Actions.