This is the dataframe I am working with,
data = pd.DataFrame({'id':[1,2,3],
'start date': ['2001-11-13', '2002-6-26', '2004-2-22'],
'end date': ['2002-6-25', '2004-2-21', '2006-8-21'],
'values':[1000,2000,3000]})
I am trying to stack only the date columns, start date and end date, while leaving the id and values columns unstacked. The final dataframe would look like this,
data = pd.DataFrame({'id':[1,1,2,2,3,3],
'date': ['2001-11-13', '2002-6-26','2004-2-22','2002-6-25','2004-2-21', '2006-8-21'],
'values':[1000,1000,2000,2000,3000,3000]})
CodePudding user response:
Just do melt
out = data.melt(['id','values']).sort_values('id')
Out[436]:
id values variable value
0 1 1000 start date 2001-11-13
3 1 1000 end date 2002-6-25
1 2 2000 start date 2002-6-26
4 2 2000 end date 2004-2-21
2 3 3000 start date 2004-2-22
5 3 3000 end date 2006-8-21
Or set_index
stack
out = data.set_index(['id','values']).stack().reset_index()
Out[437]:
id values level_2 0
0 1 1000 start date 2001-11-13
1 1 1000 end date 2002-6-25
2 2 2000 start date 2002-6-26
3 2 2000 end date 2004-2-21
4 3 3000 start date 2004-2-22
5 3 3000 end date 2006-8-21
CodePudding user response:
Here's a way to do what you've asked that gives the output with the columns labeled as in your question:
cols = ['start date', 'end date']
data = pd.concat([data.drop(columns=set(cols)-{col}).rename(columns={col:'date'}) for col in cols]).sort_values('id').reset_index(drop=True)
Output:
id date values
0 1 2001-11-13 1000
1 1 2002-6-25 1000
2 2 2002-6-26 2000
3 2 2004-2-21 2000
4 3 2004-2-22 3000
5 3 2006-8-21 3000