Home > front end >  How to change only second row in multiple groups of a dataframe
How to change only second row in multiple groups of a dataframe

Time:09-16

I would like for each group in a data frame df_task containing three rows, to modify the second row of the column Task.

import pandas as pd

df_task = pd.DataFrame({'Days':[5,5,5,20,20,20,10,10],
                   'Task':['Programing','Presentation','Training','Development','Presentation','Workshop','Coding','Communication']},)
df_task.groupby(["Days"])

This is the expected output, if the group contain three rows, the value of task from the first row is added to the value of Task from the second row, as shown in the new column New_Task, if the group has two rows, nothing is modified:

   Days           Task  New_Task
0     5     Programing  Programing
1     5   Presentation  Presentation,Programing
2     5       Training  Training
3    20    Development  Development
4    20   Presentation  Presentation,Development
5    20       Workshop  Workshop
6    10         Coding  Coding
7    10  Communication  Communication

CodePudding user response:

Your requirement are pretty straight-forward. Try:

groups = df_task.groupby('Days')

# enumeration of the rows within groups
enums = groups.cumcount()
# sizes of the groups broadcast to each row
sizes = groups['Task'].transform('size')

# so update the correct rows
df_task['New_Task'] = np.where(enums.eq(1) & sizes.gt(2), 
                      df_task['Task']   ','   groups['Task'].shift(fill_value=''),
                      df_task['Task'])
print(df_task)

Output:

   Days           Task                  New_Task
0     5     Programing                Programing
1     5   Presentation   Presentation,Programing
2     5       Training                  Training
3    20    Development               Development
4    20   Presentation  Presentation,Development
5    20       Workshop                  Workshop
6    10         Coding                    Coding
7    10  Communication             Communication
  • Related