Home > Software engineering >  python , how split one dataframe in cells into original dataframe?
python , how split one dataframe in cells into original dataframe?

Time:12-05

python Original dataframe ( 2 column ):

matchNum accumulatedscore
78       [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]
56       [{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]

How can I change them into original dataframe I hope...

matchNum home1 away1 home2 away2

78                 0     0     1     0
56                 2     1     4     3

It is so difficult.....

Original dataframe: enter image description here

I hope this : enter image description here

CodePudding user response:

Assuming your pandas DataFrame be like:

d = {'matchNum': [78, 56], 
     'accumulatedscore':["[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '0', 'away': '0'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '1', 'away': '0'}]",
                        "[{'periodvalue': 'FirstHalf', 'periodstatus': 'ResultFinal', 'home': '2', 'away': '1'}, {'periodvalue': 'SecondHalf', 'periodstatus': 'ResultFinal', 'home': '4', 'away': '3'}]"
                        ]}

import pandas as pd
df = pd.DataFrame(d)

You can simply convert the string which has the form of a python dictionary (refer here) :

import ast
df['home1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['home'] , axis=1)
df['away1']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[0]['away'], axis=1)
df['home2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['home'], axis=1)
df['away2']= df.apply(lambda x: ast.literal_eval(x['accumulatedscore'])[1]['away'], axis=1)
df = df.drop(columns = 'accumulatedscore')

Your df would be like:

    matchNum    home1   away1   home2   away2
0   78  0   0   1   0
1   56  2   1   4   3

CodePudding user response:

You can extract only the relevant key-value pairs from each dictionary in df['accumulatedscore'], explode the Series, convert it to a DataFrame, and combine duplicate indices:

df1 = (df.merge(df['accumulatedscore']
                .apply(lambda lst:tuple({'home' str(i): d['home'], 'away' str(i): d['away']} 
                                        for i, d in enumerate(lst, 1)))
                .explode()
                .apply(pd.Series)
                .groupby(level=0).first(), 
                left_index=True, right_index=True)
       .drop('accumulatedscore', axis=1))

Output:

   matchNum home1 away1 home2 away2
0        78     0     0     1     0
1        56     2     1     4     3
  • Related