Home > OS >  Group by list multiple columns with conditions
Group by list multiple columns with conditions

Time:03-21

Having

car_id speed section
0 1 10 a
1 1 10 b
2 1 30 c
3 1 10 d
4 2 10 e
5 2 0 f
6 2 10 g
7 3 0 h

I want a concatenated string in two columns, one containing the sections if speed <=10 (legal_speed) and the others (illegal_speed). The values with speed 0 should be an empty string:

car_id legal_speed illegal_speed
0 1 a, b, d c
1 2 e, g
2 3

I have been trying

df.groupby('car_id').agg(list)

but not quiet there yet

CodePudding user response:

Categorize the column speed into legal and illegal labels, then group the dataframe by car_id and labels and aggregate section using join:

s = pd.cut(df['speed'], [0, 10, np.inf], labels=['legal', 'illegal'])
df.groupby(['car_id', s])['section'].agg(', '.join).unstack()

Result

speed     legal illegal
car_id                 
1       a, b, d       c
2          e, g     NaN
3           NaN     NaN

CodePudding user response:

Here's a solution that utilizes np.select and pivot:

df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.pivot(values='section', columns='x').groupby(df['car_id']).apply(lambda g: g.apply(lambda col: col.dropna().tolist()).str.join(', ')).reset_index().rename_axis(None, axis=1)

Output:

>>> new_df
   car_id illegal_speed legal_speed na
0       1             c     a, b, d   
1       2                      e, g  f
2       3                            h

Here's another variant, which is more elegant although a bit longer:

df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.groupby('car_id').apply(lambda g: g.groupby('x')['section'].agg(list).str.join(', ')).unstack().fillna('').reset_index().rename_axis(None, axis=1)
  • Related