Home > Back-end >  How to combine diagonal data?
How to combine diagonal data?

Time:11-18

I am new to python, apologies if I do not explain well or provide partial solutions yet...

I have a dataframe as below: a key, some dates (distributed in rows), and many other columns (same key, same value)

Key Date 1 Date 2 Date 3 Column X Column Y
Key 1 2022-01-01 X11111111 Y11111111
Key 1 2022-01-02 X11111111 Y11111111
Key 1 2022-01-03 X11111111 Y11111111
Key 2 2022-12-01 X22222222 Y22222222
Key 2 2022-12-02 X22222222 Y22222222
Key 2 2022-12-03 X22222222 Y22222222

And I want to aggregate them like below, where the dates are aggregate, other columns keep the same

Key Date 1 Date 2 Date 3 Column X Column Y
Key 1 2022-01-01 2022-01-02 2022-01-03 X11111111 Y11111111
Key 2 2022-12-01 2022-12-02 2022-12-03 X22222222 Y22222222

What would be the most efficient way of doing it? Thank you.

I have tried normal pivot and aggregation but did not work as I want ...

CodePudding user response:

Assuming the empty cells are NaN, use groupby.first:

out = df.groupby('Key', as_index=False).first()

NB. If the empty cells are empty strings, use df.replace('', float('nan')).groupby('Key', as_index=False).first().

Output:

     Key      Date 1      Date 2      Date 3   Column X   Column Y
0  Key 1  2022-01-01  2022-01-02  2022-01-03  X11111111  Y11111111
1  Key 2  2022-12-01  2022-12-02  2022-12-03  X22222222  Y22222222

CodePudding user response:

Another possible solution:

(df.groupby('Key', group_keys=True)
 .apply(lambda g: g.ffill().bfill())
 .drop_duplicates()
 .reset_index(drop=True))

Output:

     Key      Date 1      Date 2      Date 3   Column X   Column Y
0  Key 1  2022-01-01  2022-01-02  2022-01-03  X11111111  Y11111111
1  Key 2  2022-12-01  2022-12-02  2022-12-03  X22222222  Y22222222
  • Related