In Excel I want to delete duplicate rows but I want to have some preferences for what duplicate occurrence to keep. As in Excel while removing duplicates it by default keeps the first occurrence. Is there any way to keep the 2nd or 3rd occurrence?
For example:
a Apple
a Banana
a Cherry
b Apple
b Banana
b Melon
c Apple
c Cherry
c Melon
By default if I remove duplicate, it would be like this:
a Apple
b Apple
c Apple
But is there a way to achieve this
a Banana
b Banana
c ?
As the C doesn't have banana, we can keep this blank, or any default value.
CodePudding user response:
To always keep the second occurrence,
- SORT the data,
- MATCH the elements among itself to get the first occurrence,
- Subtract the matched row number from actual row number and if it's not 1, filter it out.
Input:
Duplicates | Helper column |
---|---|
a | Apple |
a | Cherry |
b | Apple |
c | Apple |
c | Cherry |
b | Banana |
b | Melon |
a | Banana |
c | Melon |
Formula:
=FILTER(SORT(A1:B9,1,1),ROW(A1:A9)-MATCH(SORT(A1:A9),SORT(A1:A9),0)=1)
Output:
Second occurrence | Helper column |
---|---|
a | Cherry |
b | Banana |
c | Cherry |
CodePudding user response:
see:
=FILTER(A1:B, COUNTIFS(A1:A, A1:A, ROW(A1:A), "<="&ROW(A1:A))=2)
for 3rd occurrence change 2
to 3