Home > OS >  Time difference with the rows based on certain condition in a column in dataframe
Time difference with the rows based on certain condition in a column in dataframe

Time:11-17

The "Age" feature in the following dataframe is corrupted in the sense that given a fixed User_ID, the age is the same for all the "Date". I would like to subtract the difference in years between the date and the date that occurred for the last time from the original age.

import pandas as pd

df = pd.DataFrame({
    "User_ID": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" , "N1", "N1", "N1", "N2"],
    "Date": [ "31/10/2021", "31/10/2020" , "31/10/2019", "24/10/2019", "22/10/2018", "15/10/2017", "14/10/2017", "13/10/2016", "12/10/2016", "11/10/2015", "2/10/2015", "1/10/2015" ],
    "Age": [6,5,8,6,6,5,8,5,6,6,6,5]
})

Hence for the dataframe

          ID  Date        Age
    0     N1  2021-10-31  6
    1     N2  2020-10-31  5
    2     N3  2019-10-31  8
    3     N1  2019-10-24  6
    4     N1  2018-10-28  6
    5     N2  2017-10-15  5
    6     N3  2017-10-14  8
    7     N2  2016-10-13  5
    8     N1  2016-10-12  6
    9     N1  2015-10-11  6
    10    N1  2015-10-2   6
    11    N2  2015-10-1   5

the outcome should look like

          ID  Date        Age
    0     N1  2021-10-31  6
    1     N2  2020-10-31  5
    2     N3  2019-10-31  8
    3     N1  2019-10-24  4
    4     N1  2018-10-28  3
    5     N2  2017-10-15  2
    6     N3  2017-10-14  6
    7     N2  2016-10-13  1
    8     N1  2016-10-12  1
    9     N1  2015-10-11  0
    10    N1  2015-10-2   0
    11    N2  2015-10-1   0

Is there any fast way to do that?

CodePudding user response:

You can create Series by years, get differencies by first year in GroupBy.first with GroupBy.transform with original y and used for subtract by column Age:

y = df['Date'].dt.year
df['Age'] = df['Age'].sub(y.groupby(df['User_ID']).transform('first').sub(y))
print (df)
   User_ID       Date  Age
0       N1 2021-10-31    6
1       N2 2020-10-31    5
2       N3 2019-10-31    8
3       N1 2019-10-24    4
4       N1 2018-10-22    3
5       N2 2017-10-15    2
6       N3 2017-10-14    6
7       N2 2016-10-13    1
8       N1 2016-12-10    1
9       N1 2015-11-10    0
10      N1 2015-02-10    0
11      N2 2015-01-10    0
  • Related