Home > Enterprise >  How to condense a single dataframe based on the index
How to condense a single dataframe based on the index

Time:07-22

I have the following dataframe:

                    label scoreCTBIN004v1  scoreCTBIN003v1  scoreCTBIN001v1  
    Overall Stress Score              54              NaN              NaN     
            Stress Score              50              NaN              NaN     
  Short-Term Stress Risk              50              NaN              NaN
      Future Stress Risk              59              NaN              NaN
     Chronic Stress Risk              55              NaN              NaN
    Overall Stress Score             NaN             57.0              NaN
            Stress Score             NaN             52.0              NaN
  Short-Term Stress Risk             NaN             52.0              NaN
      Future Stress Risk             NaN             65.0              NaN
     Chronic Stress Risk             NaN             59.0              NaN
    Overall Stress Score             NaN              NaN             57.0
            Stress Score             NaN              NaN             49.0
  Short-Term Stress Risk             NaN              NaN             54.0 
      Future Stress Risk             NaN              NaN             66.0 
     Chronic Stress Risk             NaN              NaN             60.0      

I simply want to condense the label column so that they are all on the same row, but it is not combining the labels even though they are the same name?

Something simple like this:

                    label scoreCTBIN004v1  scoreCTBIN003v1  scoreCTBIN001v1  
    Overall Stress Score              54              57              57     
            Stress Score              50              52              49     
  Short-Term Stress Risk              50              52              54
      Future Stress Risk              59              65              66
     Chronic Stress Risk              55              59              60

CodePudding user response:

You can use pd.pivot_table in this case.

data = {'label':['Overall Stress Score','Stress Score','Short-Term Stress Risk','Future Stress Risk','Chronic Stress Risk']*3,'scoreCTBIN004v1':[1,2,3,4,5] [np.nan]*10,'scoreCTBIN003v1':[np.nan]*5 [6,7,8,9,10] [np.nan]*5,'scoreCTBIN001v1':[np.nan]*10 [1,2,3,4,5]}
df=pd.DataFrame(data)

pd.pivot_table(df,index='label')
Out[21]: 
                        scoreCTBIN001v1  scoreCTBIN003v1  scoreCTBIN004v1
label                                                                    
Chronic Stress Risk                 5.0             10.0              5.0
Future Stress Risk                  4.0              9.0              4.0
Overall Stress Score                1.0              6.0              1.0
Short-Term Stress Risk              3.0              8.0              3.0
Stress Score                        2.0              7.0              2.0

CodePudding user response:

If that is an accurate representation of your whole dataset, you could groupby your 'label' column, and keep 'first' values:

df.groupby('label',as_index=False).first()

prints:

                   label  scoreCTBIN004v1  scoreCTBIN003v1  scoreCTBIN001v1
0     Chronic Stress Risk             55.0             59.0             60.0
1      Future Stress Risk             59.0             65.0             66.0
2    Overall Stress Score             54.0             57.0             57.0
3  Short-Term Stress Risk             50.0             52.0             54.0
4            Stress Score             50.0             52.0             49.0

CodePudding user response:

clean and simple if you really have this format:

if label is the index:
df.stack().unstack()

output:

                        scoreCTBIN004v1  scoreCTBIN003v1  scoreCTBIN001v1
label                                                                    
Chronic Stress Risk                55.0             59.0             60.0
Future Stress Risk                 59.0             65.0             66.0
Overall Stress Score               54.0             57.0             57.0
Short-Term Stress Risk             50.0             52.0             54.0
Stress Score                       50.0             52.0             49.0
if it is a column:
df.set_index('label').stack().unstack().reset_index()

output:

                    label  scoreCTBIN004v1  scoreCTBIN003v1  scoreCTBIN001v1
0     Chronic Stress Risk             55.0             59.0             60.0
1      Future Stress Risk             59.0             65.0             66.0
2    Overall Stress Score             54.0             57.0             57.0
3  Short-Term Stress Risk             50.0             52.0             54.0
4            Stress Score             50.0             52.0             49.0
  • Related