Home > Software engineering >  manipulating data frame with pandas from a list of dictionary with lists
manipulating data frame with pandas from a list of dictionary with lists

Time:12-30

import pandas as pd

data = [{'sequence': 'he left me',
  'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
  'scores': [0.9898561835289001,
   0.9809304475784302,
   0.3625302314758301,
   0.31606775522232056,
   0.04021124914288521]},
         {'sequence': 'I lost my job',
  'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
  'scores': [0.123456,
   0.56789,
   0.78901,
   0.12345,
   0.67890]}]

df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

print(df)

that's my code, it's not giving me the right output.

here's the output.

        sequence  relationship      sad    happy  depression  suicidal
0     he left me      0.989856  0.98093  0.36253    0.316068  0.040211
1  I lost my job      0.123456  0.56789  0.78901    0.123450  0.678900

you can see that the scores are not correct. 'sad' should be 0.123456, but instead it's 0.56789. I need help here, am kinda new so having hard time.

I think I need help with this line


df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

I went from this

df = df.rename(columns={'scores': df['labels'].iloc[0]})

and then this

df = df.rename(columns={'scores': df['labels'].iloc[0][0]})

after that tried this

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)

and finally

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)

I want each of those labels to have their correct scores for every row, not just the first row.

CodePudding user response:

I'd suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:

data_processed = [
    {
      "sequence": record["sequence"], 
      **{
        label: value 
        for label, value in zip(record["labels"], record["scores"])
      },
    }
    for record in data
]

Now you can convert this directly to a DataFrame:

df = pd.DataFrame(data_processed)
  • Related