Having
car_id | speed | section | |
---|---|---|---|
0 | 1 | 10 | a |
1 | 1 | 10 | b |
2 | 1 | 30 | c |
3 | 1 | 10 | d |
4 | 2 | 10 | e |
5 | 2 | 0 | f |
6 | 2 | 10 | g |
7 | 3 | 0 | h |
I want a concatenated string in two columns, one containing the sections if speed <=10 (legal_speed) and the others (illegal_speed). The values with speed 0 should be an empty string:
car_id | legal_speed | illegal_speed | |
---|---|---|---|
0 | 1 | a, b, d | c |
1 | 2 | e, g | |
2 | 3 |
I have been trying
df.groupby('car_id').agg(list)
but not quiet there yet
CodePudding user response:
Categorize the column speed
into legal
and illegal
labels, then group the dataframe by car_id
and labels and aggregate section
using join
:
s = pd.cut(df['speed'], [0, 10, np.inf], labels=['legal', 'illegal'])
df.groupby(['car_id', s])['section'].agg(', '.join).unstack()
Result
speed legal illegal
car_id
1 a, b, d c
2 e, g NaN
3 NaN NaN
CodePudding user response:
Here's a solution that utilizes np.select
and pivot
:
df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.pivot(values='section', columns='x').groupby(df['car_id']).apply(lambda g: g.apply(lambda col: col.dropna().tolist()).str.join(', ')).reset_index().rename_axis(None, axis=1)
Output:
>>> new_df
car_id illegal_speed legal_speed na
0 1 c a, b, d
1 2 e, g f
2 3 h
Here's another variant, which is more elegant although a bit longer:
df['x'] = np.select([df['speed'] > 10, df['speed'] == 0], ['illegal_speed', 'na'], default='legal_speed')
new_df = df.groupby('car_id').apply(lambda g: g.groupby('x')['section'].agg(list).str.join(', ')).unstack().fillna('').reset_index().rename_axis(None, axis=1)