I have a data set that has year vs month values like this
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2004 1.9 1.7 1.7 2.3 3.1 3.3 3 2.7 2.5 3.2 3.5 3.3
2005 3 3 3.1 3.5 2.8 2.5 3.2 3.6 4.7 4.3 3.5 3.4
2006 4 3.6 3.4 3.5 4.2 4.3 4.1 3.8 2.1 1.3 2 2.5
2007 2.1 2.4 2.8 2.6 2.7 2.7 2.4 2 2.8 3.5 4.3 4.1
2008 4.3 4 4 3.9 4.2 5 5.6 5.4 4.9 3.7 1.1 0.1
I want to convert it to like this using Python / Pandas:
Date Value
Jan-04 1.9
Feb-04 1.7
Mar-04 1.7
Apr-04 2.3
May-04 3.1
Jun-04 3.3
Jul-04 3
Aug-04 2.7
Sep-04 2.5
Oct-04 3.2
Nov-04 3.5
Dec-04 3.3
Jan-05 3
Feb-05 3
Mar-05 3.1
Apr-05 3.5
May-05 2.8
Jun-05 2.5
Jul-05 3.2
Aug-05 3.6
Sep-05 4.7
Oct-05 4.3
Nov-05 3.5
Dec-05 3.4
How can this be done?
CodePudding user response:
Use DataFrame.stack
for reshape, then if possible join last 2 values per year
s with month names:
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = df.pop('level_1') '-' df['date'].astype(str).str[2:]
print (df.head())
date Value
0 JAN-04 1.9
1 FEB-04 1.7
2 MAR-04 1.7
3 APR-04 2.3
4 MAY-04 3.1
Or with convert to datetimes:
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') df['date'].astype(str) , format='%b%Y')
print (df.head())
date Value
0 2004-01-01 1.9
1 2004-02-01 1.7
2 2004-03-01 1.7
3 2004-04-01 2.3
4 2004-05-01 3.1
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') df['date'].astype(str) , format='%b%Y').dt.strftime('%b-%y').str.upper()
print (df.head())
date Value
0 JAN-04 1.9
1 FEB-04 1.7
2 MAR-04 1.7
3 APR-04 2.3
4 MAY-04 3.1
CodePudding user response:
Or alternatively using melt:
cols = ['JAN', 'FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
rows=list(range(2004,2009))
#I've used random numbers instead of your values.
df = pd.DataFrame(index=rows,columns=cols,data=np.random.rand(5,12)).reset_index()
tdf = df.melt(id_vars=['index'])
tdf['d'] = pd.to_datetime(tdf['variable'] tdf['index'].astype(str), format='%b%Y')
print(tdf)
Output:
index variable value comb d
0 2004 JAN 0.963338 JAN2004 2004-01-01
1 2005 JAN 0.265815 JAN2005 2005-01-01
2 2006 JAN 0.254360 JAN2006 2006-01-01
3 2007 JAN 0.275372 JAN2007 2007-01-01
4 2008 JAN 0.042116 JAN2008 2008-01-01
The cleaning of the columns, I leave to the OP.