I'm pretty new to python and pandas. I have a dataframe with columns that hold the number of seedlings for a particular size class:
Plot Location Year Species 6-12_in 12-24_in 24-54_in GreaterThan_54_in
1 VMU 2015 BC 3 8
What I want to do is convert that dataframe to a format like this, where each size class (6-12_in, 12-24_in, 24-54_in, and GreaterThan_54_in) are numbered 1-4 and put into Size_Class/Count columns like this:
Plot Location Year Species Size_Class Count
1 VMU 2015 BC 1 3
1 VMU 2015 BC 3 8
I arbitrarily named the columns from dataframe 1, so
6-12_in =1
12-24_in =2
24-54_in=3
GreaterThan_54_in=4
I could easily write this looping through each row and building the new dataframe with if statements, but I feel like there must be a map/apply solution that is more efficient? I found this thread, which is kind of similar, but I'm not sure how to easily map the column names and make multiple rows? Merge multiple column values into one column in python pandas
Any help to get started is appreciated- thank you!
CodePudding user response:
# define dictionary to map the sizes to size class
d={'6-12_in' :1,
'12-24_in' :2,
'24-54_in':3,
'GreaterThan_54_in':4}
# melt the dataframe
df2=df.melt(id_vars=['Plot','Location','Year','Species'],
var_name='size_class',
value_name='count')
df2
# apply map
df2['size_class']=df2['size_class'].map(d)
# drop where count is null
df2[df2['count'].notna()]
Plot Location Year Species size_class count
0 1 VMU 2015 BC 1 3.0
1 1 VMU 2015 BC 2 8.0
CodePudding user response:
You can use melt to create a new row for each of your size columns. Then group by Plot and assign each row an incremental id using cumcount
. Drop the null values after and you should get your desired result.
import pandas as pn
import numpy as np
df = pd.DataFrame({'Plot': [1],
'Location': ['VMU'],
'Year': [2015],
'Species': ['BC'],
'6-12_in': [3],
'12-24_in': [np.nan],
'24-54_in': [8],
'GreaterThan_54_in': [np.nan]})
df = df.melt(id_vars=['Plot','Location','Year','Species'],
var_name='Size_Class',
value_name='Count')
df['Size_Class'] = df.groupby('Plot')['Size_Class'].cumcount() 1
df.dropna()
Output
Plot Location Year Species Size_Class Count
0 1 VMU 2015 BC 1 3.0
2 1 VMU 2015 BC 3 8.0