Home > Mobile >  Removing duplicate/opposite entries in Google Sheets
Removing duplicate/opposite entries in Google Sheets

Time:10-12

I have a sheet containing the following data: URL A, URL B, similar score in percent.

If URL A is 98% similar to URL B, it means that URL B is 98% similar to URL A, and listed as well.

I want to find and eliminate these duplicates/reversed entries. For now, I have tried having two extra columns concatenating URL A URL B in one, and URL B URL A in one. This way I have unique identifiers.

After this I'm kinda stuck, because I'm dealing with a lot of variables, as data is in two different rows, and two different columns. I might be looking into a script, taking the A B value, iterating through the B A value until it finds a match, and somehow marks this (or simply just deletes it), since my knowledge of formulas for highlighting these duplicates are falling short.

This sheet shows the concept - the first 100 rows (it's about 11K in total): https://docs.google.com/spreadsheets/d/1YKsguAn1lYjV4FlP_6_TlKGvFcpFAEzn7bpAyOEmozQ/edit?usp=sharing

Any suggestions for what I should look into?

CodePudding user response:

Try the filter(match()) pattern to find duplicate values, like this:

=unique( 
  flatten( 
    filter( 
      A2:B, 
      match(A2:A & B2:B, B2:B & A2:A, 0), 
      C2:C >= 90 
    ) 
  ) 
)

CodePudding user response:

I ended up with a solution where I sorted by URL A and implemented this formula:

=IF(A2<B2,A2&B2,B2&A2)

This way I had the concatenation the same way for the real one and the opposite. I didn't know you could use "<" on strings.

After this, I could delete duplicated values in the column with the formula above.

  • Related