I am new to python, apologies if I do not explain well or provide partial solutions yet...
I have a dataframe as below: a key, some dates (distributed in rows), and many other columns (same key, same value)
Key | Date 1 | Date 2 | Date 3 | Column X | Column Y |
---|---|---|---|---|---|
Key 1 | 2022-01-01 | X11111111 | Y11111111 | ||
Key 1 | 2022-01-02 | X11111111 | Y11111111 | ||
Key 1 | 2022-01-03 | X11111111 | Y11111111 | ||
Key 2 | 2022-12-01 | X22222222 | Y22222222 | ||
Key 2 | 2022-12-02 | X22222222 | Y22222222 | ||
Key 2 | 2022-12-03 | X22222222 | Y22222222 |
And I want to aggregate them like below, where the dates are aggregate, other columns keep the same
Key | Date 1 | Date 2 | Date 3 | Column X | Column Y |
---|---|---|---|---|---|
Key 1 | 2022-01-01 | 2022-01-02 | 2022-01-03 | X11111111 | Y11111111 |
Key 2 | 2022-12-01 | 2022-12-02 | 2022-12-03 | X22222222 | Y22222222 |
What would be the most efficient way of doing it? Thank you.
I have tried normal pivot and aggregation but did not work as I want ...
CodePudding user response:
Assuming the empty cells are NaN
, use groupby.first
:
out = df.groupby('Key', as_index=False).first()
NB. If the empty cells are empty strings, use df.replace('', float('nan')).groupby('Key', as_index=False).first()
.
Output:
Key Date 1 Date 2 Date 3 Column X Column Y
0 Key 1 2022-01-01 2022-01-02 2022-01-03 X11111111 Y11111111
1 Key 2 2022-12-01 2022-12-02 2022-12-03 X22222222 Y22222222
CodePudding user response:
Another possible solution:
(df.groupby('Key', group_keys=True)
.apply(lambda g: g.ffill().bfill())
.drop_duplicates()
.reset_index(drop=True))
Output:
Key Date 1 Date 2 Date 3 Column X Column Y
0 Key 1 2022-01-01 2022-01-02 2022-01-03 X11111111 Y11111111
1 Key 2 2022-12-01 2022-12-02 2022-12-03 X22222222 Y22222222