First I have this data frame:
ID | Age | name | time | |
---|---|---|---|---|
0 | 1 | 12 | r | y |
1 | 1 | 13 | c | y |
2 | 1 | 14 | n | y |
3 | 1 | 15 | m | y |
4 | 2 | 11 | l | N |
5 | 2 | 22 | k | N |
6 | 2 | 33 | r | N |
7 | 2 | 55 | l | N |
First I want to groupby ID ( so I will have group 1 & 2)
Then from the [Age] column in the grouped by ID dataframe, I only need the first 2 rows and last row from each gorup .So for group (1) from the [Age] column, I need the first row which is = 12, the second row which is = 13 and the last row which is = 15. Surely, I need to do the same for gorup 2 as well
and for the rest of the columns which are [name] & [time] in the goruped by data frame, I only need the last row, so for group(1), from the [name] column I need last row, which is = m, and from the [time] column I need last row which is = y.
by the end I will have one row only for each ID
this is my expected/desired output:
ID | Age 1 | Age 2 | Age 3 | name | time | |
---|---|---|---|---|---|---|
0 | 1 | 12 | 13 | 15 | m | Y |
1 | 2 | 11 | 22 | 55 | l | N |
CodePudding user response:
Try with groupby
and pivot
:
#keep only the needed data
grouped = df.groupby("ID", as_index=False).agg({"Age": lambda x: x.tolist()[:2] [x.iat[-1]], "name": "last", "time": "last"}).explode("Age")
#get the count for the age columns
grouped["idx"] = grouped.groupby("ID").cumcount().add(1)
#pivot to get the required structure
output = grouped.pivot(["ID","name","time"],"idx","Age").add_prefix("Age").reset_index().rename_axis(None, axis=1)
>>> output
ID name time Age1 Age2 Age3
0 1 m y 12 13 15
1 2 l N 11 22 55
CodePudding user response:
df1 = df.groupby('ID').agg({'Age':lambda x:list(np.r_[x.head(2),x.tail(1)])})
df1[['name', 'time']] = df.groupby('ID')[['name', 'time']].last()
df1[['Age1', 'Age2', 'Age3']] = pd.DataFrame(df1['Age'].to_list(), index = df1.index)
df1.drop('Age', axis = 1).reset_index()
ID name time Age1 Age2 Age3
0 1 m y 12 13 15
1 2 l N 11 22 55