import pandas as pd
data = [{'sequence': 'he left me',
'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
'scores': [0.9898561835289001,
0.9809304475784302,
0.3625302314758301,
0.31606775522232056,
0.04021124914288521]},
{'sequence': 'I lost my job',
'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
'scores': [0.123456,
0.56789,
0.78901,
0.12345,
0.67890]}]
df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
print(df)
that's my code, it's not giving me the right output.
here's the output.
sequence relationship sad happy depression suicidal
0 he left me 0.989856 0.98093 0.36253 0.316068 0.040211
1 I lost my job 0.123456 0.56789 0.78901 0.123450 0.678900
you can see that the scores are not correct. 'sad' should be 0.123456, but instead it's 0.56789. I need help here, am kinda new so having hard time.
I think I need help with this line
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
I went from this
df = df.rename(columns={'scores': df['labels'].iloc[0]})
and then this
df = df.rename(columns={'scores': df['labels'].iloc[0][0]})
after that tried this
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)
and finally
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)
I want each of those labels to have their correct scores for every row, not just the first row.
CodePudding user response:
I'd suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:
data_processed = [
{
"sequence": record["sequence"],
**{
label: value
for label, value in zip(record["labels"], record["scores"])
},
}
for record in data
]
Now you can convert this directly to a DataFrame:
df = pd.DataFrame(data_processed)