I have the following table below:
I would like to collapse Code_1 and Code_2 columns based on ID and Date. Based on what I have found online, I have tried the below snippet of code but it does not seem to be working.
df= df.groupby(['ID','Date']).agg(''.join)
DF:
ID | Date | Count_Code1 | Count_Code2 | Code_1 | Code_2 |
---|---|---|---|---|---|
A1 | 2022-02-02 | 90 | 0 | AAAA | NaN |
A1 | 2022-02-02 | 0 | 50 | NaN | BBBB |
A1 | 2022-03-14 | 34 | 0 | AAAA | NaN |
C1 | 2022-04-20 | 0 | 13 | NaN | BBBB |
C1 | 2022-04-20 | 15 | 0 | AAAA | NaN |
Desired output:
ID | Date | Count_Code1 | Count_Code2 | Code_1 | Code_2 |
---|---|---|---|---|---|
A1 | 2022-02-02 | 90 | 50 | AAAA | BBBB |
A1 | 2022-03-14 | 34 | 0 | AAAA | NaN |
C1 | 2022-04-20 | 15 | 13 | AAAA | BBBB |
CodePudding user response:
# groupby and take the group's max
df.groupby(['ID','Date'], as_index=False).max()
ID Date Code_1 Code_2
0 A1 2022-02-02 AAAA BBBB
1 A1 2022-03-14 AAAA 0
2 C1 2022-04-20 AAAA BBBB
Alternate Solution
# replace 0 with null, and ffill, bfill null values on grouped rows
# groupby drops the grouped on keys, so, concat them back
# finally drop the duplicates
pd.concat([df.iloc[:,:2],
df.replace(0, np.nan).groupby(['ID', 'Date'], as_index=False ).ffill().bfill()],
axis=1).drop_duplicates()
ID Date Count_Code1 Count_Code2 Code_1 Code_2
0 A1 2022-02-02 90.0 50.0 AAAA BBBB
2 A1 2022-03-14 34.0 13.0 AAAA BBBB
3 C1 2022-04-20 15.0 13.0 AAAA BBBB