Home > OS >  Stacking two columns in a dataframe
Stacking two columns in a dataframe

Time:07-24

This is the dataframe I am working with,

data = pd.DataFrame({'id':[1,2,3], 
                     'start date': ['2001-11-13', '2002-6-26', '2004-2-22'],
                     'end date': ['2002-6-25', '2004-2-21', '2006-8-21'],
                     'values':[1000,2000,3000]})

I am trying to stack only the date columns, start date and end date, while leaving the id and values columns unstacked. The final dataframe would look like this,

data = pd.DataFrame({'id':[1,1,2,2,3,3], 
                     'date': ['2001-11-13', '2002-6-26','2004-2-22','2002-6-25','2004-2-21', '2006-8-21'],
                     'values':[1000,1000,2000,2000,3000,3000]})

CodePudding user response:

Just do melt

out = data.melt(['id','values']).sort_values('id')
Out[436]: 
   id  values    variable       value
0   1    1000  start date  2001-11-13
3   1    1000    end date   2002-6-25
1   2    2000  start date   2002-6-26
4   2    2000    end date   2004-2-21
2   3    3000  start date   2004-2-22
5   3    3000    end date   2006-8-21

Or set_index stack

out = data.set_index(['id','values']).stack().reset_index()
Out[437]: 
   id  values     level_2           0
0   1    1000  start date  2001-11-13
1   1    1000    end date   2002-6-25
2   2    2000  start date   2002-6-26
3   2    2000    end date   2004-2-21
4   3    3000  start date   2004-2-22
5   3    3000    end date   2006-8-21

CodePudding user response:

Here's a way to do what you've asked that gives the output with the columns labeled as in your question:

cols = ['start date', 'end date']
data = pd.concat([data.drop(columns=set(cols)-{col}).rename(columns={col:'date'}) for col in cols]).sort_values('id').reset_index(drop=True)

Output:

   id        date  values
0   1  2001-11-13    1000
1   1   2002-6-25    1000
2   2   2002-6-26    2000
3   2   2004-2-21    2000
4   3   2004-2-22    3000
5   3   2006-8-21    3000
  • Related