I would like for each group in a data frame df_task
containing three rows, to modify the second row of the column Task
.
import pandas as pd
df_task = pd.DataFrame({'Days':[5,5,5,20,20,20,10,10],
'Task':['Programing','Presentation','Training','Development','Presentation','Workshop','Coding','Communication']},)
df_task.groupby(["Days"])
This is the expected output, if the group contain three rows, the value of task from the first row is added to the value of Task from the second row, as shown in the new column New_Task
, if the group has two rows, nothing is modified:
Days Task New_Task
0 5 Programing Programing
1 5 Presentation Presentation,Programing
2 5 Training Training
3 20 Development Development
4 20 Presentation Presentation,Development
5 20 Workshop Workshop
6 10 Coding Coding
7 10 Communication Communication
CodePudding user response:
Your requirement are pretty straight-forward. Try:
groups = df_task.groupby('Days')
# enumeration of the rows within groups
enums = groups.cumcount()
# sizes of the groups broadcast to each row
sizes = groups['Task'].transform('size')
# so update the correct rows
df_task['New_Task'] = np.where(enums.eq(1) & sizes.gt(2),
df_task['Task'] ',' groups['Task'].shift(fill_value=''),
df_task['Task'])
print(df_task)
Output:
Days Task New_Task
0 5 Programing Programing
1 5 Presentation Presentation,Programing
2 5 Training Training
3 20 Development Development
4 20 Presentation Presentation,Development
5 20 Workshop Workshop
6 10 Coding Coding
7 10 Communication Communication