Home > front end >  Replace with empty string duplicate values selected columns csv
Replace with empty string duplicate values selected columns csv

Time:07-15

I have come across a perhaps very specific problem, I have the following data:

Input

id name phone1 phone2 phone3
1 John 3222430085 9983146924 1234567890
2 Vanessa 3222430085 9934254243
3 Maria 7861628239 7861628239 7861628239
4 Oscar 7861157888 3222430085
4 Paul 7151087912 5555555555
5 Paul 7151087912 5555555555

Expected output

id name phone1 phone2 phone3
1 John 3222430085 9983146924 1234567890
2 Vanessa 9934254243
3 Maria 7861628239
4 Oscar 7861157888
4 Paul 7151087912 5555555555
5 Paul

I pose the problem basically based on a couple of selected columns to eliminate the duplicate records that are found within them.

In this case selected phone1, phone2 and phone3

I have managed with pandas to remove entire rows that are exactly the same or by selecting columns but with this case I have not been able to solve it, if anyone has any idea if it is possible to do this I will be very grateful

CodePudding user response:

Filter the phone like columns then stack to reshape into a multiindex series, then drop duplicates in the series then unstack and assign the result back to original dataframe

df.assign(**df.filter(like='phone').stack().drop_duplicates().unstack())

  id     name      phone1      phone2      phone3
0  1     John  3222430085  9983146924  1234567890
1  2  Vanessa         NaN  9934254243         NaN
2  3    Maria  7861628239         NaN         NaN
3  4    Oscar  7861157888         NaN         NaN
4  4     Paul  7151087912  5555555555         NaN
5  5     Paul         NaN         NaN         NaN
  • Related