I have a list
sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
"14/07/2021","15/07/2021","16/07/2021","17/07/2021",
"18/07/2021","19/07/2021","20/07/2021","21/07/2021",
"22/07/2021"]
and i have a dataframe like below
Truckid Tripid
1 1
1 1
1 1
1 2
1 2
1 3
1 3
1 3
1 4
1 4
1 4
1 5
1 5
1 5
2 1
2 1
2 2
2 2
2 2
2 3
2 3
I want to add the Date column in a way like whenever trip_id changes the number, the date should move to next element
I want result to be like below
Truckid Tripid Date
1 1 10/07/2021
1 1 10/07/2021
1 1 10/07/2021
1 2 11/07/2021
1 2 11/07/2021
1 3 12/07/2021
1 3 12/07/2021
1 3 12/07/2021
1 4 13/07/2021
1 4 13/07/2021
1 4 13/07/2021
1 5 14/07/2021
1 5 14/07/2021
1 5 14/07/2021
2 1 15/07/2021
2 1 15/07/2021
2 2 16/07/2021
2 2 16/07/2021
2 2 16/07/2021
2 3 17/07/2021
2 3 17/07/2021
CodePudding user response:
You can compute the group number (from 0 to n) using GroupBy.ngroup
, and map
the value to your list indices (using a temporary dictionary):
df['Date'] = (df
.groupby(['Truckid', 'Tripid']).ngroup() # get group ID
.map(dict(enumerate(sample_dates))) # match to items in order
)
output:
Truckid Tripid Date
0 1 1 10/07/2021
1 1 1 10/07/2021
2 1 1 10/07/2021
3 1 2 11/07/2021
4 1 2 11/07/2021
5 1 3 12/07/2021
6 1 3 12/07/2021
7 1 3 12/07/2021
8 1 4 13/07/2021
9 1 4 13/07/2021
10 1 4 13/07/2021
11 1 5 14/07/2021
12 1 5 14/07/2021
13 1 5 14/07/2021
14 2 1 15/07/2021
15 2 1 15/07/2021
16 2 2 16/07/2021
17 2 2 16/07/2021
18 2 2 16/07/2021
19 2 3 17/07/2021
20 2 3 17/07/2021
CodePudding user response:
For each consecutive Tripid
create groups by compared with shifted values and then use Series.map
:
df['new'] = (df['Tripid'].ne(df['Tripid'].shift()).cumsum()
.map(dict(enumerate(sample_dates, 1))))
print (df)
ruckid Tripid new
0 1 1 10/07/2021
1 1 1 10/07/2021
2 1 1 10/07/2021
3 1 2 11/07/2021
4 1 2 11/07/2021
5 1 3 12/07/2021
6 1 3 12/07/2021
7 1 3 12/07/2021
8 1 4 13/07/2021
9 1 4 13/07/2021
10 1 4 13/07/2021
11 1 5 14/07/2021
12 1 5 14/07/2021
13 1 5 14/07/2021
14 2 1 15/07/2021
15 2 1 15/07/2021
16 2 2 16/07/2021
17 2 2 16/07/2021
18 2 2 16/07/2021
19 2 3 17/07/2021
20 2 3 17/07/2021
Or if possible specify only first date use to_datetime
:
df['new'] = pd.to_datetime(df['Tripid'].ne(df['Tripid'].shift()).cumsum().sub(1),
unit='d',
origin='2021-07-10')
print (df)
ruckid Tripid new
0 1 1 2021-07-10
1 1 1 2021-07-10
2 1 1 2021-07-10
3 1 2 2021-07-11
4 1 2 2021-07-11
5 1 3 2021-07-12
6 1 3 2021-07-12
7 1 3 2021-07-12
8 1 4 2021-07-13
9 1 4 2021-07-13
10 1 4 2021-07-13
11 1 5 2021-07-14
12 1 5 2021-07-14
13 1 5 2021-07-14
14 2 1 2021-07-15
15 2 1 2021-07-15
16 2 2 2021-07-16
17 2 2 2021-07-16
18 2 2 2021-07-16
19 2 3 2021-07-17
20 2 3 2021-07-17
Details:
print (df['Tripid'].ne(df['Tripid'].shift()).cumsum())
0 1
1 1
2 1
3 2
4 2
5 3
6 3
7 3
8 4
9 4
10 4
11 5
12 5
13 5
14 6
15 6
16 7
17 7
18 7
19 8
20 8
Name: Tripid, dtype: int32