I have a dataframe:
col1 col2 col3 val col4
a1 b1 c1 10 dd
a1 b1 c1 15 kk
a2 b2 c2 20 ff
a2 b2 c2 35 mm
a3 b3 c3 9 sd
I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd NA NA
How to do that? Is there any function to turn those rows into columns within group?
CodePudding user response:
If there exist at most two groups (see comments), then you can make use of the first
and last
functions in combination with a groupby
statement. You just should define your own last
function, that returns you the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1
, it would return you nan
.
Code:
df = pd.DataFrame(
{"col1": ["a1", "a1", "a2", "a2", "a3"],
"col2": ["b1", "b1", "b2", "b2", "b3"],
"col3": ["c1", "c1", "c2", "c2", "c3"],
"val": [10, 15, 20, 35, 9],
"col4": ["dd", "kk", "ff", "mm", "sd"]}
)
last_check = lambda x: pd.NA if len(x) == 1 else x.iloc[-1]
df.groupby(["col1", "col2", "col3"], as_index=False)\
.agg(val_1=("val", "first"),
col4_1=("col4", "first"),
val_2=("val", last_check),
col4_2=("col4", last_check))
Output:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd <NA> <NA>