Home > Software engineering >  add column in pandas dataframe by condition
add column in pandas dataframe by condition

Time:05-03

I have a list

sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
                "14/07/2021","15/07/2021","16/07/2021","17/07/2021",
                "18/07/2021","19/07/2021","20/07/2021","21/07/2021",
                "22/07/2021"]

and i have a dataframe like below

Truckid   Tripid
  1          1
  1          1
  1          1
  1          2
  1          2
  1          3
  1          3
  1          3
  1          4
  1          4
  1          4
  1          5
  1          5
  1          5
  2          1
  2          1
  2          2
  2          2
  2          2
  2          3
  2          3

I want to add the Date column in a way like whenever trip_id changes the number, the date should move to next element

I want result to be like below

Truckid   Tripid   Date  
  1          1     10/07/2021
  1          1     10/07/2021 
  1          1     10/07/2021
  1          2     11/07/2021
  1          2     11/07/2021
  1          3     12/07/2021
  1          3     12/07/2021
  1          3     12/07/2021
  1          4     13/07/2021
  1          4     13/07/2021
  1          4     13/07/2021
  1          5     14/07/2021
  1          5     14/07/2021 
  1          5     14/07/2021 
  2          1     15/07/2021 
  2          1     15/07/2021
  2          2     16/07/2021
  2          2     16/07/2021
  2          2     16/07/2021
  2          3     17/07/2021
  2          3     17/07/2021

CodePudding user response:

You can compute the group number (from 0 to n) using GroupBy.ngroup, and map the value to your list indices (using a temporary dictionary):

df['Date'] = (df
              .groupby(['Truckid', 'Tripid']).ngroup() # get group ID
              .map(dict(enumerate(sample_dates)))      # match to items in order
             )

output:

    Truckid  Tripid        Date
0         1       1  10/07/2021
1         1       1  10/07/2021
2         1       1  10/07/2021
3         1       2  11/07/2021
4         1       2  11/07/2021
5         1       3  12/07/2021
6         1       3  12/07/2021
7         1       3  12/07/2021
8         1       4  13/07/2021
9         1       4  13/07/2021
10        1       4  13/07/2021
11        1       5  14/07/2021
12        1       5  14/07/2021
13        1       5  14/07/2021
14        2       1  15/07/2021
15        2       1  15/07/2021
16        2       2  16/07/2021
17        2       2  16/07/2021
18        2       2  16/07/2021
19        2       3  17/07/2021
20        2       3  17/07/2021

CodePudding user response:

For each consecutive Tripid create groups by compared with shifted values and then use Series.map:

df['new'] = (df['Tripid'].ne(df['Tripid'].shift()).cumsum()
                           .map(dict(enumerate(sample_dates, 1))))
print (df)
    ruckid  Tripid         new
0        1       1  10/07/2021
1        1       1  10/07/2021
2        1       1  10/07/2021
3        1       2  11/07/2021
4        1       2  11/07/2021
5        1       3  12/07/2021
6        1       3  12/07/2021
7        1       3  12/07/2021
8        1       4  13/07/2021
9        1       4  13/07/2021
10       1       4  13/07/2021
11       1       5  14/07/2021
12       1       5  14/07/2021
13       1       5  14/07/2021
14       2       1  15/07/2021
15       2       1  15/07/2021
16       2       2  16/07/2021
17       2       2  16/07/2021
18       2       2  16/07/2021
19       2       3  17/07/2021
20       2       3  17/07/2021

Or if possible specify only first date use to_datetime:

df['new'] = pd.to_datetime(df['Tripid'].ne(df['Tripid'].shift()).cumsum().sub(1),
                           unit='d', 
                           origin='2021-07-10')
print (df)
    ruckid  Tripid        new
0        1       1 2021-07-10
1        1       1 2021-07-10
2        1       1 2021-07-10
3        1       2 2021-07-11
4        1       2 2021-07-11
5        1       3 2021-07-12
6        1       3 2021-07-12
7        1       3 2021-07-12
8        1       4 2021-07-13
9        1       4 2021-07-13
10       1       4 2021-07-13
11       1       5 2021-07-14
12       1       5 2021-07-14
13       1       5 2021-07-14
14       2       1 2021-07-15
15       2       1 2021-07-15
16       2       2 2021-07-16
17       2       2 2021-07-16
18       2       2 2021-07-16
19       2       3 2021-07-17
20       2       3 2021-07-17

Details:

print (df['Tripid'].ne(df['Tripid'].shift()).cumsum())
0     1
1     1
2     1
3     2
4     2
5     3
6     3
7     3
8     4
9     4
10    4
11    5
12    5
13    5
14    6
15    6
16    7
17    7
18    7
19    8
20    8
Name: Tripid, dtype: int32
  • Related