I have the following dataframe:
label scoreCTBIN004v1 scoreCTBIN003v1 scoreCTBIN001v1
Overall Stress Score 54 NaN NaN
Stress Score 50 NaN NaN
Short-Term Stress Risk 50 NaN NaN
Future Stress Risk 59 NaN NaN
Chronic Stress Risk 55 NaN NaN
Overall Stress Score NaN 57.0 NaN
Stress Score NaN 52.0 NaN
Short-Term Stress Risk NaN 52.0 NaN
Future Stress Risk NaN 65.0 NaN
Chronic Stress Risk NaN 59.0 NaN
Overall Stress Score NaN NaN 57.0
Stress Score NaN NaN 49.0
Short-Term Stress Risk NaN NaN 54.0
Future Stress Risk NaN NaN 66.0
Chronic Stress Risk NaN NaN 60.0
I simply want to condense the label column so that they are all on the same row, but it is not combining the labels even though they are the same name?
Something simple like this:
label scoreCTBIN004v1 scoreCTBIN003v1 scoreCTBIN001v1
Overall Stress Score 54 57 57
Stress Score 50 52 49
Short-Term Stress Risk 50 52 54
Future Stress Risk 59 65 66
Chronic Stress Risk 55 59 60
CodePudding user response:
You can use pd.pivot_table
in this case.
data = {'label':['Overall Stress Score','Stress Score','Short-Term Stress Risk','Future Stress Risk','Chronic Stress Risk']*3,'scoreCTBIN004v1':[1,2,3,4,5] [np.nan]*10,'scoreCTBIN003v1':[np.nan]*5 [6,7,8,9,10] [np.nan]*5,'scoreCTBIN001v1':[np.nan]*10 [1,2,3,4,5]}
df=pd.DataFrame(data)
pd.pivot_table(df,index='label')
Out[21]:
scoreCTBIN001v1 scoreCTBIN003v1 scoreCTBIN004v1
label
Chronic Stress Risk 5.0 10.0 5.0
Future Stress Risk 4.0 9.0 4.0
Overall Stress Score 1.0 6.0 1.0
Short-Term Stress Risk 3.0 8.0 3.0
Stress Score 2.0 7.0 2.0
CodePudding user response:
If that is an accurate representation of your whole dataset, you could groupby
your 'label' column, and keep 'first'
values:
df.groupby('label',as_index=False).first()
prints:
label scoreCTBIN004v1 scoreCTBIN003v1 scoreCTBIN001v1
0 Chronic Stress Risk 55.0 59.0 60.0
1 Future Stress Risk 59.0 65.0 66.0
2 Overall Stress Score 54.0 57.0 57.0
3 Short-Term Stress Risk 50.0 52.0 54.0
4 Stress Score 50.0 52.0 49.0
CodePudding user response:
clean and simple if you really have this format:
if label is the index:
df.stack().unstack()
output:
scoreCTBIN004v1 scoreCTBIN003v1 scoreCTBIN001v1
label
Chronic Stress Risk 55.0 59.0 60.0
Future Stress Risk 59.0 65.0 66.0
Overall Stress Score 54.0 57.0 57.0
Short-Term Stress Risk 50.0 52.0 54.0
Stress Score 50.0 52.0 49.0
if it is a column:
df.set_index('label').stack().unstack().reset_index()
output:
label scoreCTBIN004v1 scoreCTBIN003v1 scoreCTBIN001v1
0 Chronic Stress Risk 55.0 59.0 60.0
1 Future Stress Risk 59.0 65.0 66.0
2 Overall Stress Score 54.0 57.0 57.0
3 Short-Term Stress Risk 50.0 52.0 54.0
4 Stress Score 50.0 52.0 49.0