Group by list multiple columns with conditions-CodePudding

Having

	car_id	speed	section
0	1	10	a
1	1	10	b
2	1	30	c
3	1	10	d
4	2	10	e
5	2	0	f
6	2	10	g
7	3	0	h

I want a concatenated string in two columns, one containing the sections if speed <=10 (legal_speed) and the others (illegal_speed). The values with speed 0 should be an empty string:

	car_id	legal_speed	illegal_speed
0	1	a, b, d	c
1	2	e, g
2	3

I have been trying

df.groupby('car_id').agg(list)

but not quiet there yet

CodePudding user response：

Categorize the column speed into legal and illegal labels, then group the dataframe by car_id and labels and aggregate section using join:

s = pd.cut(df['speed'], [0, 10, np.inf], labels=['legal', 'illegal'])
df.groupby(['car_id', s])['section'].agg(', '.join).unstack()

Result

speed     legal illegal
car_id                 
1       a, b, d       c
2          e, g     NaN
3           NaN     NaN

CodePudding user response：

Here's a solution that utilizes np.select and pivot:

df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.pivot(values='section', columns='x').groupby(df['car_id']).apply(lambda g: g.apply(lambda col: col.dropna().tolist()).str.join(', ')).reset_index().rename_axis(None, axis=1)

Output:

>>> new_df
   car_id illegal_speed legal_speed na
0       1             c     a, b, d   
1       2                      e, g  f
2       3                            h

Here's another variant, which is more elegant although a bit longer:

df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.groupby('car_id').apply(lambda g: g.groupby('x')['section'].agg(list).str.join(', ')).unstack().fillna('').reset_index().rename_axis(None, axis=1)