Home > database >  Is there a way to disaggregate columns after pivoting a dataframe?
Is there a way to disaggregate columns after pivoting a dataframe?

Time:10-20

my question is how to disaggregate hierarchical columns. After merging two dataframes and computing some aggregations, I have a pivot table as a result.
However, I want to further extract the minimum time_delta in absolute terms per row as an additional column.
I am unable to do this, as python only extracts the minimum from the whole dataframe.
I want following columns: index, name, firm_ID, timestamp_one, department_ID1, department_ID2, department_ID3, … and so on. I am aware that the number of columns in my dataframe depends on how many department_IDs a firm has. Basically, I am trying to merge top level column names with lower-level column names.

Thanks in advance

See my code below:

gp_columns = {
'name': ['companyA', 'companyB'],
'firm_ID' : [1, 2],
'timestamp_one' : ['2016-04-01', '2017-09-01']
}

fund_columns = {
    'firm_ID': [1, 1, 2, 2, 2],
    'department_ID' : [10, 11, 20, 21, 22],
    'timestamp_mult' : ['2015-01-01', '2016-03-01', '2016-10-01', '2017-02-01', '2018-11-01'],
    'number' : [400, 500, 1000, 3000, 4000]
    }

gp_df = pd.DataFrame(gp_columns)
fund_df = pd.DataFrame(fund_columns)
gp_df['timestamp_one'] = pd.to_datetime(gp_df['timestamp_one'])
fund_df['timestamp_mult'] = pd.to_datetime(fund_df['timestamp_mult'])

merged_df = gp_df.merge(fund_df)
merged_df

merged_df_v1 = merged_df.copy()
merged_df_v1['incidence_num'] = merged_df.groupby('firm_ID')['department_ID']\
                              .transform('cumcount')
merged_df_v1['incidence_num'] = merged_df_v1['incidence_num']   1
merged_df_v1['time_delta'] = merged_df_v1['timestamp_mult'] - merged_df_v1['timestamp_one']

merged_wide = pd.pivot(merged_df_v1, index = ['name','firm_ID', 'timestamp_one'], \
                       columns = 'incidence_num', \
                       values = ['department_ID', 'time_delta', 'timestamp_mult', 'number'])
merged_wide.reset_index()

This is my result

CodePudding user response:

df.min() takes an argument for axis that you can set to 1 for columns.

So try:

merged_wide = merged_wide.reset_index() # might want this line at the end of what you have
merged_wide['timestamp_min'] = merged_wide['timestamp_mult'].min(axis=1)
  • Related