Home > Software design >  Drop rows with duplicate column values in a dataframe and create a new column with secondary column
Drop rows with duplicate column values in a dataframe and create a new column with secondary column

Time:08-31

I have a dataframe with the following structure:

id  a   b
1   X   A
2   X   B
3   X   C
3   Y   D
4   Y   E
5   Y   F
5   X   G
6   X   H
7   Y   I

I'd like to remove the duplicate ids (3, 5) but make new columns (c, d) that contain the data in the 'a' and 'b' columns being deleted within the remaining id. e.g.:

id  a   b   c   d
1   X   A       
2   X   B       
3   X   C   Y   D
4   Y   E       
5   Y   F   X   G
6   X   H       
7   Y   I       

I'm not what you'd call 'good at programming,' and this has gotten above my ability to search for a solution. An important point is that the logic doesn't need to be first/last/max/min for column a or b because these are uncorrelated strings and the sequence of occurrence doesn't matter — the goal is to import the data from rows being removed to the remaining row across the df. Any hints are appreciated!

CodePudding user response:

this is the way you can follow:

res = (df.groupby('id').
       apply(lambda x: x[['a','b']].values.ravel()).
       apply(pd.Series))

print(res)
'''
    0  1    2    3
id                
1   X  A  NaN  NaN
2   X  B  NaN  NaN
3   X  C    Y    D
4   Y  E  NaN  NaN
5   Y  F    X    G
6   X  H  NaN  NaN
7   Y  I  NaN  NaN
  • Related