I have a dataframe input called df_final
that looks like below:
location 2022-01 2022-02 2022-03 2022-01 2022-02 2022-03
AGH 2062.25 6561.43 3154.43 0 0 0
AIM 3122.74 3277.56 3631.45 0 0 0
AJL 1641.82 3229.49 4937.23 0 0 0
.................
The first batch of months(first three column months) are actually the total cost while the last three columns are actually the total CN. My code to generate this output is like below:
# Get monthly delivery CN
total_delivery_CN = monthly_delivery_CN_by_branch(start_date, end_date)
# Get monthly pickup CN
total_pickup_CN = monthly_pickup_CN_by_branch(start_date, end_date)
# Get monthly maintainence cost:
total_maintainence_cost = monthly_maintainence_cost_by_branch(start_date, end_date)
# Get monthly refuel cost:
total_refuel_cost = monthly_refuel_cost_by_branch(start_date, end_date)
# Union
df = pd.concat([total_delivery_CN, total_pickup_CN, total_maintainence_cost, total_refuel_cost])
print('Union')
print(df)
df.to_csv('df.csv')
# Pivot
df_final = df.pivot_table(index=['location'], columns = ['report_month'], aggfunc = {'total_CN' : np.sum, 'Cost' : np.sum} , fill_value=0)
df_final.columns = df_final.columns.to_flat_index().to_series().apply(lambda x: x[1])
df_final = df_final.reset_index()
df_final.to_csv('df_final.csv')
print(df_final)
However, I would one to merge the two total's under the same month so the final output would look like:
2022-01 2022-02 2022-03
location Cost total_CN Cost total_CN Cost total_CN
AGH 2062.25 0 6561.43 0 3154.43 0
AIM 3122.74 0 3277.56 0 3631.45 0
AJL 1641.82 0 3229.49 0 4937.23 0
.................
How do I append these two total's under the same month?
CodePudding user response:
I started with the data frame:
df = pd.DataFrame(
[
[2,3,4,0.2,0.3,0.4],
[2,3,4,0.2,0.3,0.4],
[2,3,4,0.2,0.3,0.4]
],
index=('AGH', 'AIM', 'AJL'),
columns=('2022-01', '2022-02', '2022-03', '2022-01', '2022-02', '2022-03')
).rename_axis('location')
Which looks like
2022-01 2022-02 2022-03 2022-01 2022-02 2022-03
location
AGH 2 3 4 0.2 0.3 0.4
AIM 2 3 4 0.2 0.3 0.4
AJL 2 3 4 0.2 0.3 0.4
This is what you're looking for:
def flatten(l):
return [item for sublist in l for item in sublist]
def create_multiindex(cols):
return pd.MultiIndex.from_tuples(
flatten([[(char, 'Cost'), (char, 'total_CN')] for char in sorted(set(cols))])
)
df = df[sorted(set(df.columns))]
df.columns = create_multiindex(df.columns)
It yields:
2022-01 2022-02 2022-03
Cost total_CN Cost total_CN Cost total_CN
location
AGH 2 0.2 3 0.3 4 0.4
AIM 2 0.2 3 0.3 4 0.4
AJL 2 0.2 3 0.3 4 0.4
CodePudding user response:
You can slice df_final into two dataframes, and then you can concatenate both frames two one as we did here:
df_cost = df_final.iloc[:, 0:3]
df_total_cn = df_final.iloc[:, 3:6]
df = pd.concat([df_cost, df_total_cn], axis=1, keys=['cost', 'total cn']).swaplevel(axis=1).sort_index('columns').fillna(0)
print(df)
2022-01 2022-02 2022-03
cost total cn cost total cn cost total cn
location
AGH 2062.25 0 6561.43 0 3154.43 0
AIM 3122.74 0 3277.56 0 3631.45 0
AJL 1641.82 0 3229.49 0 4937.23 0