Home > Software design >  How to convert matrix data with rows and column to a single column data frame in python
How to convert matrix data with rows and column to a single column data frame in python

Time:02-24

I have a data set that has year vs month values like this

    JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2004    1.9 1.7 1.7 2.3 3.1 3.3 3   2.7 2.5 3.2 3.5 3.3
2005    3   3   3.1 3.5 2.8 2.5 3.2 3.6 4.7 4.3 3.5 3.4
2006    4   3.6 3.4 3.5 4.2 4.3 4.1 3.8 2.1 1.3 2   2.5
2007    2.1 2.4 2.8 2.6 2.7 2.7 2.4 2   2.8 3.5 4.3 4.1
2008    4.3 4   4   3.9 4.2 5   5.6 5.4 4.9 3.7 1.1 0.1

I want to convert it to like this using Python / Pandas:

Date    Value
Jan-04  1.9
Feb-04  1.7
Mar-04  1.7
Apr-04  2.3
May-04  3.1
Jun-04  3.3
Jul-04  3
Aug-04  2.7
Sep-04  2.5
Oct-04  3.2
Nov-04  3.5
Dec-04  3.3
Jan-05  3
Feb-05  3
Mar-05  3.1
Apr-05  3.5
May-05  2.8
Jun-05  2.5
Jul-05  3.2
Aug-05  3.6
Sep-05  4.7
Oct-05  4.3
Nov-05  3.5
Dec-05  3.4

How can this be done?

CodePudding user response:

Use DataFrame.stack for reshape, then if possible join last 2 values per years with month names:

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = df.pop('level_1')   '-'  df['date'].astype(str).str[2:]
print (df.head())
     date  Value
0  JAN-04    1.9
1  FEB-04    1.7
2  MAR-04    1.7
3  APR-04    2.3
4  MAY-04    3.1

Or with convert to datetimes:

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1')   df['date'].astype(str) , format='%b%Y')
print (df.head())
        date  Value
0 2004-01-01    1.9
1 2004-02-01    1.7
2 2004-03-01    1.7
3 2004-04-01    2.3
4 2004-05-01    3.1

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1')   df['date'].astype(str) , format='%b%Y').dt.strftime('%b-%y').str.upper()
print (df.head())
     date  Value
0  JAN-04    1.9
1  FEB-04    1.7
2  MAR-04    1.7
3  APR-04    2.3
4  MAY-04    3.1

CodePudding user response:

Or alternatively using melt:

cols = ['JAN', 'FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
rows=list(range(2004,2009))
#I've used random numbers instead of your values.
df = pd.DataFrame(index=rows,columns=cols,data=np.random.rand(5,12)).reset_index()
tdf = df.melt(id_vars=['index'])
tdf['d'] = pd.to_datetime(tdf['variable'] tdf['index'].astype(str), format='%b%Y')
print(tdf)

Output:

    index variable     value     comb          d
0    2004      JAN  0.963338  JAN2004 2004-01-01
1    2005      JAN  0.265815  JAN2005 2005-01-01
2    2006      JAN  0.254360  JAN2006 2006-01-01
3    2007      JAN  0.275372  JAN2007 2007-01-01
4    2008      JAN  0.042116  JAN2008 2008-01-01

The cleaning of the columns, I leave to the OP.

  • Related