Home > OS >  Pandas Dataframe - convert columns to size classes and counts
Pandas Dataframe - convert columns to size classes and counts

Time:09-30

I'm pretty new to python and pandas. I have a dataframe with columns that hold the number of seedlings for a particular size class:

Plot    Location    Year    Species 6-12_in 12-24_in    24-54_in    GreaterThan_54_in
1       VMU         2015    BC      3                   8   

What I want to do is convert that dataframe to a format like this, where each size class (6-12_in, 12-24_in, 24-54_in, and GreaterThan_54_in) are numbered 1-4 and put into Size_Class/Count columns like this:

Plot    Location    Year    Species Size_Class  Count
1       VMU         2015    BC      1           3
1       VMU         2015    BC      3           8

I arbitrarily named the columns from dataframe 1, so

6-12_in =1
12-24_in =2
24-54_in=3
GreaterThan_54_in=4

I could easily write this looping through each row and building the new dataframe with if statements, but I feel like there must be a map/apply solution that is more efficient? I found this thread, which is kind of similar, but I'm not sure how to easily map the column names and make multiple rows? Merge multiple column values into one column in python pandas

Any help to get started is appreciated- thank you!

CodePudding user response:

# define dictionary to map the sizes to size class
d={'6-12_in' :1,
'12-24_in' :2,
'24-54_in':3,
'GreaterThan_54_in':4}


# melt the dataframe
df2=df.melt(id_vars=['Plot','Location','Year','Species'], 
            var_name='size_class', 
            value_name='count')
df2

# apply map 
df2['size_class']=df2['size_class'].map(d)

# drop where count is null
df2[df2['count'].notna()]
    Plot    Location    Year    Species     size_class  count
0      1    VMU         2015         BC              1    3.0
1      1    VMU         2015         BC              2    8.0

CodePudding user response:

You can use melt to create a new row for each of your size columns. Then group by Plot and assign each row an incremental id using cumcount. Drop the null values after and you should get your desired result.

import pandas as pn
import numpy as np
df = pd.DataFrame({'Plot': [1],
 'Location': ['VMU'],
 'Year': [2015],
 'Species': ['BC'],
 '6-12_in': [3],
 '12-24_in': [np.nan],
 '24-54_in': [8],
 'GreaterThan_54_in': [np.nan]})

df = df.melt(id_vars=['Plot','Location','Year','Species'],
             var_name='Size_Class',
             value_name='Count')
df['Size_Class'] = df.groupby('Plot')['Size_Class'].cumcount() 1
df.dropna()

Output

   Plot Location  Year Species  Size_Class  Count
0     1      VMU  2015      BC           1    3.0
2     1      VMU  2015      BC           3    8.0
  • Related