Home > Back-end >  How do I pick unique values comparing two different excel columns?
How do I pick unique values comparing two different excel columns?

Time:09-17

I am currently exporting CSV files from a Windows directory, comparing an endpoint with the server files.

I have on the CSV files the FullPath, FileName, Size, and MD5 Hashes of the files.

We know the directory structure is almost the same, with minor differences.

What I'm trying to achieve on excel, is to get the MD5 values that share the same name, but different MD5.

My current roadblock is that I can't find a way to tell excel to go through a table and pick only the MD5 hashes that are Unique on both tables (server and endpoint).

For simplicity, this table is sorted by name, but in excel I don't have the names coincide,
Say:

ServerName ServerHash - EndName EndHash
NameA ABCD1234 - NameA ABCD1234
NameB DCBA4321 - NameB DIFFHASH
NameC HASHFINE - NameC HASHFINE
NameD ABCD1234 -

Results in, after filtering:

ServerName ServerHash - EndName EndHash
NameB DCBA4321 - NameB DIFFHASH
NameD ABCD1234 -

I tried to "remove duplicates" but I think the tool only looks up in the same column and doesn't cross compare in between ServerHash and EndHash.

Is there an easier way to do this?

Thanks

CodePudding user response:

Because you are importing a CSV file anyways, lets use Power Query.

  • From the Data tab in the ribbon, select From Text/CSV. Navigate to your file, Import, and click Transform Data from the resulting dialog.

enter image description here

  • If necessary, select Use First Row as Headers from the Home tab.

enter image description here

  • Select the ServerName and ServerHash columns > right-click > Merge Columns, use a | delimiter. Do the same thing with the EndName and EndHash columns.

  • You should now only have two columns. Merge those together with a different delimiter, e.g. colon

  • Right Click, Split Column by Delimiter. Choose the colon delimiter, click Advanced, choose Rows instead of columns.

  • Split that resulting column by the | delimiter.

  • Select both columns, choose Group By from the Transform tab, and do a Count operation. Now you can filter out anything with a count of 2 or greater.

  • If you need to keep track of Server and End hash types, you can create two custom columns prior to doing any merging, indicate the hash type, and include those in the initial column merges.

CodePudding user response:

You may benefit from COUNTIFS:

enter image description here

My formula in column A is:

=COUNTIFS($E$3:$E$6;B3;$F$3:$F$6;C3)

The rows that return a 0 value mean those are uniques.

  • Related