Home > Software design >  Convert column data to row wise data
Convert column data to row wise data

Time:09-27

I have a dataframe as shown below,

dataframe

data_dict = {'CustCode': {0: 64, 1: 64, 2: 97, 3: 97, 4: 97, 5: 97, 6: 97, 7: 97, 8: 97, 9: 110}, 'InvoiceMonth': {0: 'Aug', 1: 'Sep', 2: 'Apr', 3: 'Aug', 4: 'Feb', 5: 'Jan', 6: 'Jul', 7: 'Mar', 8: 'May', 9: 'Feb'}, 'TotalAmount': {0: 357300, 1: 419800, 2: 515000, 3: 740700, 4: 301200, 5: 197200, 6: 112400, 7: 534900, 8: 220500, 9: 422500}}

How can I convert that into something like this(created manually few rows to show desired DataFrame), desired result

Any help will be appreciated.

CodePudding user response:

You need to pivot your table, and since there are some missing months on your sample you also need to add them seperately:

import numpy as np
new_df = df.pivot(index='CustCode', columns='InvoiceMonth', values='TotalAmount')

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

new_df[[col for col in months if col not in new_df.columns]] = np.nan
new_df['Total'] = new_df.sum(axis=1)
new_df.reset_index(inplace=True)

CodePudding user response:

Try:

df = df.pivot(index="CustCode", columns="InvoiceMonth", values="TotalAmount")

months = [
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep",
    "Oct",
    "Nov",
    "Dec",
]

df = df.reindex(months, axis=1)
df["Total"] = df.sum(axis=1)
df = df.fillna("").reset_index()
df.index.name, df.columns.name = None, None
print(df)

Prints:

   CustCode       Jan       Feb       Mar       Apr       May Jun       Jul       Aug       Sep Oct Nov Dec      Total
0        64                                                                  357300.0  419800.0               777100.0
1        97  197200.0  301200.0  534900.0  515000.0  220500.0      112400.0  740700.0                        2621900.0
2       110            422500.0                                                                               422500.0

CodePudding user response:

Here is another proposition by using pandas.to_datetime to sort the columns without a dict :

out = (df.assign(month=pd.to_datetime(df['InvoiceMonth'], format='%b').dt.month)
         .set_index('month').sort_index()
         .reset_index(drop=True)
         .pivot_table(index='CustCode', columns=('InvoiceMonth'), values='TotalAmount', sort=False)
         .reset_index()
        .rename_axis(None,axis=1)
      )

out['Total'] = out.sum(axis=1)

To separate thousands with comma, use :

pd.options.display.float_format = '{:,}'.format

# Output :

print(out)

   CustCode       Jan       Feb       Mar       Apr       May       Jul       Aug       Sep       Total
0        97 197,200.0 301,200.0 534,900.0 515,000.0 220,500.0 112,400.0 740,700.0       NaN 2,621,997.0
1       110       NaN 422,500.0       NaN       NaN       NaN       NaN       NaN       NaN   422,610.0
2        64       NaN       NaN       NaN       NaN       NaN       NaN 357,300.0 419,800.0   777,164.0
  • Related