Home > database >  How do I get a time delta that is closest to 0 days?
How do I get a time delta that is closest to 0 days?

Time:10-20

I have the following dataframe:

gp_columns = {
'name': ['companyA', 'companyB'],
'firm_ID' : [1, 2],
'timestamp_one' : ['2016-04-01', '2017-09-01']
}

fund_columns = {
    'firm_ID': [1, 1, 2, 2, 2],
    'department_ID' : [10, 11, 20, 21, 22],
    'timestamp_mult' : ['2015-01-01', '2016-03-01', '2016-10-01', '2017-02-01', '2018-11-01'],
    'number' : [400, 500, 1000, 3000, 4000]
    }

gp_df = pd.DataFrame(gp_columns)
fund_df = pd.DataFrame(fund_columns)
gp_df['timestamp_one'] = pd.to_datetime(gp_df['timestamp_one'])
fund_df['timestamp_mult'] = pd.to_datetime(fund_df['timestamp_mult'])

merged_df = gp_df.merge(fund_df)
merged_df

merged_df_v1 = merged_df.copy()
merged_df_v1['incidence_num'] = merged_df.groupby('firm_ID')['department_ID']\
                              .transform('cumcount')
merged_df_v1['incidence_num'] = merged_df_v1['incidence_num']   1
merged_df_v1['time_delta'] = merged_df_v1['timestamp_mult'] - merged_df_v1['timestamp_one']

merged_wide = pd.pivot(merged_df_v1, index = ['name','firm_ID', 'timestamp_one'], \
                       columns = 'incidence_num', \
                       values = ['department_ID', 'time_delta', 'timestamp_mult', 'number'])
merged_wide.reset_index()

that looks as follows: enter image description here

My question is how i get a column that calculates the minimum time delta (so closest to 0). Note that the time delta can be negative or positive, so .abs() does not work for me here.

I want a dataframe with this particular output: enter image description here

CodePudding user response:

You can stack (which removes NaTs) and groupby.first after sorting the rows by absolute value (with the key parameter of sort_values):

df = merged_wide.reset_index()

df['time_delta_min'] = (df['time_delta'].stack()
                        .sort_values(key=abs)
                        .groupby(level=0).first()
                       )

output:

                   name firm_ID timestamp_one department_ID           \
incidence_num                                             1   2    3   
0              companyA       1    2016-04-01            10  11  NaN   
1              companyB       2    2017-09-01            20  21   22   

              time_delta                    timestamp_mult             \
incidence_num          1         2        3              1          2   
0              -456 days  -31 days      NaT     2015-01-01 2016-03-01   
1              -335 days -212 days 426 days     2016-10-01 2017-02-01   

                         number             time_delta_min  
incidence_num          3      1     2     3                 
0                    NaT    400   500   NaN       -31 days  
1             2018-11-01   1000  3000  4000      -212 days  

CodePudding user response:

Use lookup with indices of absolute values by DataFrame.idxmin:

idx, cols = pd.factorize(df['time_delta'].abs().idxmin(axis=1))

df['time_delta_min'] = (df['time_delta'].reindex(cols, axis=1).to_numpy()
                                                [np.arange(len(df)), idx])
print (df)
  • Related