I have two columns, one has the single date and may have list of dates, it can be empty list also. I want to calculate the difference of age between first column and all the dates of the second column.
column1 column2 result
11-01-2014 [1975-12-16, 1980-07-24] [39,34]
20-11-2014 [1985-08-05, 1983-03-16] [29,31]
26-12-2016 [1966-05-22, 1958-04-13] [50,58]
20-05-2016 [1981-04-21, 1983-12-25] [35,33]
01-01-2016 [1993-10-29, 1966-06-27] [23,50]
I have column1
and column2
as input and I am expecting output in the form result
.
CodePudding user response:
Use DataFrame.explode
for column instead lists, so possible subtract years by Series.dt.year
, last aggregate list
s:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
df1 = df.explode('column2')
df1['column2'] = pd.to_datetime(df1['column2'])
df1['result'] = df1['column1'].dt.year.sub(df1['column2'].dt.year)
df = df1.groupby([df1.index, 'column1']).agg(list).reset_index(level=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16 00:00:00, 1980-07-24 00:00:00] [39, 34]
1 2014-11-20 [1985-08-05 00:00:00, 1983-03-16 00:00:00] [29, 31]
2 2016-12-26 [1966-05-22 00:00:00, 1958-04-13 00:00:00] [50, 58]
3 2016-05-20 [1981-04-21 00:00:00, 1983-12-25 00:00:00] [35, 33]
4 2016-01-01 [1993-10-29 00:00:00, 1966-06-27 00:00:00] [23, 50]
Or use lambda function with convert lists to datetimes:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
f = lambda x: [x['column1'].year - y.year for y in pd.to_datetime(x['column2'])]
df['result'] = df.apply(f, axis=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16, 1980-07-24] [39, 34]
1 2014-11-20 [1985-08-05, 1983-03-16] [29, 31]
2 2016-12-26 [1966-05-22, 1958-04-13] [50, 58]
3 2016-05-20 [1981-04-21, 1983-12-25] [35, 33]
4 2016-01-01 [1993-10-29, 1966-06-27] [23, 50]