Here is an extract of my dataframe :
ID | LU | MA | ME | JE | VE | SA | DI |
---|---|---|---|---|---|---|---|
200 | B | B | B | ||||
201 | C | C | C | C | C | ||
211 | A | ||||||
211 | D | D | D | ||||
211 | B | ||||||
213 | A | A | |||||
216 | K | K | K | K | |||
216 | K | ||||||
217 | B | B | B | B | B |
I have some rows with same ID
and want to "merge" them into only one row while completing them.
Here is an example of what I want to have as a result :
ID | LU | MA | ME | JE | VE | SA | DI |
---|---|---|---|---|---|---|---|
200 | B | B | B | ||||
201 | C | C | C | C | C | ||
211 | A | D | D | D | B | ||
213 | A | A | |||||
216 | K | K | K | K | K | ||
217 | B | B | B | B | B |
I'm new to pandas dataframes and have try to use drop_duplicates
method but I need something different because of the restriction on keep
parameters.
Also the dataframe is sorted by ID.
CodePudding user response:
If there is only one non empty value per groups use:
df = df.replace('',np.nan).groupby('ID', as_index=False).first().fillna('')
If possible multiple values and need unique values in original order use lambda function:
print (df)
ID LU MA ME JE VE SA DI
0 201 B C B
1 201 C C C B C
f = lambda x: ','.join(dict.fromkeys(x.dropna()).keys())
df = df.replace('',np.nan).groupby('ID', as_index=False).agg(f)
print (df)
ID LU MA ME JE VE SA DI
0 201 B,C C C B C
CodePudding user response:
This could be treated as a pivot. You'd need to melt the df first then pivot:
(df.melt(id_vars='ID')
.dropna()
.pivot(index='ID',columns='variable',values='value')
.fillna('')
.rename_axis(None, axis=1)
.reset_index()
)