I have a large dataframe such as below:
vehicle id delta
0 0 0
1 0 20
2 0 40
3 0 400
4 0 10
5 1 0
6 1 10
7 1 500
8 1 10
9 1 10
10 1 100
11 1 10
I want to add a new column as 'Trip' for each different vehicle that starts with trip_1 and if the delta is more than 50, then it adds a number to the trip number so the results would be as follow:
vehicle id delta Trip
0 0 0 trip_1
1 0 20 trip_1
2 0 40 trip_1
3 0 400 trip_2
4 0 10 trip_2
5 1 0 trip_1
6 1 10 trip_1
7 1 500 trip_2
8 1 10 trip_2
9 1 10 trip_2
10 1 100 trip_3
11 1 10 trip_3
I'm thinking about using iterrow() but I want to avoid it since the dataframe is huge. Any suggestions?
CodePudding user response:
Try something like this:
df['trip'] = 'Trip_' df.assign(tripid = (df.groupby('id')['delta'].diff() > 50).cumsum() 1)\
.groupby('id')['tripid'].transform(lambda x: x.factorize()[0] 1).astype(str)
Output:
vehicle id delta trip
0 0 0 0 Trip_1
1 1 0 20 Trip_1
2 2 0 40 Trip_1
3 3 0 400 Trip_2
4 4 0 10 Trip_2
5 5 1 0 Trip_1
6 6 1 10 Trip_1
7 7 1 500 Trip_2
8 8 1 10 Trip_2
9 9 1 10 Trip_2
10 10 1 100 Trip_3
11 11 1 10 Trip_3
CodePudding user response:
You can use np.select
which is way faster than looping
Your example:
import numpy as np
delta = df["delta"]
condlist = [delta < 50, (delta >50) & (delta <100) , delta >=100]
choicelist = ["trip_1", "trip_2","trip_3"]
df["Trip"] = np.select(condlist, choicelist)
Output
print(df)
vehicle id delta Trip
0 0 0 trip_1
1 0 20 trip_1
2 0 40 trip_1
3 0 400 trip_2
4 0 10 trip_2
5 1 0 trip_1
6 1 10 trip_1
7 1 500 trip_2
8 1 10 trip_2
9 1 10 trip_2
10 1 100 trip_3
11 1 10 trip_3
CodePudding user response:
Try this:
df['Trip'] = 'trip_' df.groupby('id')['delta'].transform(
lambda grp: np.where(grp > 50, 1, 0).cumsum() 1).apply(str)
print(df)
vehicle id delta Trip
0 0 0 0 trip_1
1 1 0 20 trip_1
2 2 0 40 trip_1
3 3 0 400 trip_2
4 4 0 10 trip_2
5 5 1 0 trip_1
6 6 1 10 trip_1
7 7 1 500 trip_2
8 8 1 10 trip_2
9 9 1 10 trip_2
10 10 1 100 trip_3
11 11 1 10 trip_3
CodePudding user response:
here is one way to do it
df['trip']=df.assign(trip=
np.where ( (df.groupby(['vehicle_id'])['delta'].diff()>50),
1,
0)).groupby(['vehicle_id'])['trip'].cumsum() 1
df['trip']='Trip_' df['trip'].astype('str')
df
vehicle_id delta trip
0 0 0 Trip_1
1 0 20 Trip_1
2 0 40 Trip_1
3 0 400 Trip_2
4 0 10 Trip_2
5 1 0 Trip_1
6 1 10 Trip_1
7 1 500 Trip_2
8 1 10 Trip_2
9 1 10 Trip_2
10 1 100 Trip_3
11 1 10 Trip_3