python Original dataframe ( 2 column ):
matchNum accumulatedscore
78 [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]
56 [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]
How can I change them into original dataframe I hope...
matchNum home1 away1 home2 away2
78 0 0 1 0
56 2 1 4 3
It is so difficult.....
Original dataframe:
I hope this :
CodePudding user response:
Assuming your pandas DataFrame be like:
d = {'matchNum': [78, 56],
'accumulatedscore':["[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]",
"[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]"
]}
import pandas as pd
df = pd.DataFrame(d)
You can simply convert the string which has the form of a python dictionary (refer here) :
import ast
df['home1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['home'] , axis=1)
df['away1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['away'], axis=1)
df['home2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['home'], axis=1)
df['away2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['away'], axis=1)
df = df.drop(columns = 'accumulatedscore')
Your df would be like:
matchNum home1 away1 home2 away2
0 78 0 0 1 0
1 56 2 1 4 3
CodePudding user response:
You can extract only the relevant key-value pairs from each dictionary in df['accumulatedscore']
, explode
the Series, convert it to a DataFrame, and combine duplicate indices:
df1 = (df.merge(df['accumulatedscore']
.apply(lambda lst:tuple({'home' str(i): d['home'], 'away' str(i): d['away']}
for i, d in enumerate(lst, 1)))
.explode()
.apply(pd.Series)
.groupby(level=0).first(),
left_index=True, right_index=True)
.drop('accumulatedscore', axis=1))
Output:
matchNum home1 away1 home2 away2
0 78 0 0 1 0
1 56 2 1 4 3