my question is how to disaggregate hierarchical columns.
After merging two dataframes and computing some aggregations, I have a pivot table as a result.
However, I want to further extract the minimum time_delta in absolute terms per row as an additional column.
I am unable to do this, as python only extracts the minimum from the whole dataframe.
I want following columns: index, name, firm_ID, timestamp_one, department_ID1, department_ID2, department_ID3, … and so on. I am aware that the number of columns in my dataframe depends on how many department_IDs a firm has. Basically, I am trying to merge top level column names with lower-level column names.
Thanks in advance
See my code below:
gp_columns = {
'name': ['companyA', 'companyB'],
'firm_ID' : [1, 2],
'timestamp_one' : ['2016-04-01', '2017-09-01']
}
fund_columns = {
'firm_ID': [1, 1, 2, 2, 2],
'department_ID' : [10, 11, 20, 21, 22],
'timestamp_mult' : ['2015-01-01', '2016-03-01', '2016-10-01', '2017-02-01', '2018-11-01'],
'number' : [400, 500, 1000, 3000, 4000]
}
gp_df = pd.DataFrame(gp_columns)
fund_df = pd.DataFrame(fund_columns)
gp_df['timestamp_one'] = pd.to_datetime(gp_df['timestamp_one'])
fund_df['timestamp_mult'] = pd.to_datetime(fund_df['timestamp_mult'])
merged_df = gp_df.merge(fund_df)
merged_df
merged_df_v1 = merged_df.copy()
merged_df_v1['incidence_num'] = merged_df.groupby('firm_ID')['department_ID']\
.transform('cumcount')
merged_df_v1['incidence_num'] = merged_df_v1['incidence_num'] 1
merged_df_v1['time_delta'] = merged_df_v1['timestamp_mult'] - merged_df_v1['timestamp_one']
merged_wide = pd.pivot(merged_df_v1, index = ['name','firm_ID', 'timestamp_one'], \
columns = 'incidence_num', \
values = ['department_ID', 'time_delta', 'timestamp_mult', 'number'])
merged_wide.reset_index()
CodePudding user response:
df.min()
takes an argument for axis
that you can set to 1 for columns.
So try:
merged_wide = merged_wide.reset_index() # might want this line at the end of what you have
merged_wide['timestamp_min'] = merged_wide['timestamp_mult'].min(axis=1)