Home > Software engineering >  Remove Date Grouping from Data
Remove Date Grouping from Data

Time:07-14

Looking to clean multiple data sets in a more automated way. The current format is year as column, month as row, the number values.

Below is an example of the current format, the original data has multiple years/months.

Current Format:

Year Jan Feb
2022 300 200

Below is an example of how I would like the new format to look like. It combines month and year into one column and transposes the number into another column.

How would I go about doing this in excel or python? Have files with many years and multiple months.

New Format:

Date Number
2022-01 300
2022-02 200

CodePudding user response:

Check below solution. You need to extend month_df for the months, current just cater to the example.

import pandas as pd

df = pd.DataFrame({'Year':[2022],'Jan':[300],'Feb':[200]})

month_df = pd.DataFrame({'Char_Month':['Jan','Feb'], 'Int_Month':['01','02']})

melted_df = pd.melt(df, id_vars=['Year'], value_vars=['Jan', 'Feb'], var_name='Char_Month',value_name='Number')

pd.merge(melted_df, month_df,left_on='Char_Month', right_on='Char_Month').\
assign(Year=melted_df['Year'].astype('str') '-' month_df['Int_Month'])\
[['Year','Number']]

Output:

enter image description here

  • Related