im getting confused with the data type of my pandas dataframe and dont know how to split my entries into several columns.
Data looks like:
Name1 Name2
0 [0.1,0.2,0.3] [{'label': 'Neutral', 'score': 0.60}]
1 [0.4,0.5,0.6] [{'label': 'Negative', 'score': 0.60}]
2 [0.7,0.8,0.9] [{'label': 'Positive', 'score': 0.60}]
The result should look like:
Name1 N1 N2 N3 Name2 Label Score
0 [0.1,0.2,0.3] 0.1 0.2 0.3 [{'label': 'Neutral','score': 0.60}] Neutral 0.60
1 [0.4,0.5,0.6] 0.4. 0.5. 0.6 [{'label': 'Negative','score': 0.60}] Negative 0.60
2 [0.7,0.8,0.9] 0.7 0.8 0.9 [{'label': 'Positive','score': 0.60}] Positive 0.60
Not quite confident with python but i need to work with a large dataset of a fwe 100k entries.
Help much appreciated!
Best
CodePudding user response:
You can use pandas.DataFrame.join
and pandas.Series.tolist
.
df = df.join(
pd.DataFrame(df['Name1'].tolist(), columns=['N1', 'N2', 'N3']
)).join(pd.DataFrame(df['Name2'].apply(lambda x: x[0]).tolist()))
print(df)
Output:
Name1 Name2 N1 N2 N3 label score
0 [0.1, 0.2, 0.3] [{'label': 'Neutral', 'score': 0.6}] 0.1 0.2 0.3 Neutral 0.6
1 [0.4, 0.5, 0.6] [{'label': 'Negative', 'score': 0.6}] 0.4 0.5 0.6 Negative 0.6
2 [0.7, 0.8, 0.9] [{'label': 'Positive', 'score': 0.6}] 0.7 0.8 0.9 Positive 0.6
Input DataFrame:
df = pd.DataFrame({
'Name1' : [[0.1,0.2,0.3], [0.4,0.5,0.6], [0.7,0.8,0.9]] ,
'Name2' : [
[{'label': 'Neutral', 'score': 0.60}],
[{'label': 'Negative', 'score': 0.60}],
[{'label': 'Positive', 'score': 0.60}]
]
})
CodePudding user response:
You can use to_list() function on a specific column to make columns out of list.
More of that you can find under this link:
https://datascienceparichay.com/article/split-pandas-column-of-lists-into-multiple-columns/
To do a similar thing with dict refer to this page:
https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas