Convert Dict type pandas rows into columns in pandas-CodePudding

I have an intresting situtation my dataframe looks something like this.

        marks_in_test_1                marks_in_test_2
rank1   {'english': 25, 'maths': 30}    {'english': 15, 'maths': 30, 'science': 45}
rank4   {'english': 34, 'maths': 39}    {'english': 35, 'maths': 31}

I want to convert it into something as follows which looks like a pivot table in which old columns values also act as indexes.

                        english maths  Science
rank1  marks_in_test_1  25       30     NaN
rank1  marks_in_test_2  15       30     45
rank2  marks_in_test_1  34       35     NaN
rank2  marks_in_test_2  39       31     NaN

I have tried to look into pandas pivot docs but nothing helpful was there.

CodePudding user response：

Use DataFrame.stack with DataFrame constructor:

s = df.stack()
df = pd.DataFrame(s.tolist(), index=s.index)
print (df)
                       english  maths
rank1 marks_in_test_1       25     30
      marks_in_test_2       15     30
rank4 marks_in_test_1       34     39
      marks_in_test_2       35     31

s = df.stack()
df = pd.DataFrame(s.tolist(), index=s.index).rename_axis(['a','b']).reset_index()
print (df)
       a                b  english  maths
0  rank1  marks_in_test_1       25     30
1  rank1  marks_in_test_2       15     30
2  rank4  marks_in_test_1       34     39
3  rank4  marks_in_test_2       35     31

CodePudding user response：

You can convert your dicts by applying pd.Series on each of them (even if it's not really optimized)

>>> df.stack().apply(pd.Series).rename_axis(index=['rank', 'marks']).reset_index()

    rank            marks  english  maths  science
0  rank1  marks_in_test_1     25.0   30.0      NaN
1  rank1  marks_in_test_2     15.0   30.0     45.0
2  rank4  marks_in_test_1     34.0   39.0      NaN
3  rank4  marks_in_test_2     35.0   31.0      NaN

CodePudding user response：

    #transposing column    
    df = df.melt(id_vars=["rank"])
    df.head()

        

        rank    variable        value
    0   rank1   marks_in_test_1 {'english': 25, 'maths': 30}
    1   rank4   marks_in_test_1 {'english': 34, 'maths': 39}
    2   rank1   marks_in_test_2 {'english': 15, 'maths': 30, 'science': 45}
    3   rank4   marks_in_test_2 {'english': 35, 'maths': 31}

    df = pd.concat([df[['rank','variable']], df['value'].astype(str).str.replace("{","").replace("}","").str.split(', ', expand=True)], axis=1)
    df.head()

        rank    variable        0               1            2
    0   rank1   marks_in_test_1 'english': 25   'maths': 30}    None
    1   rank4   marks_in_test_1 'english': 34   'maths': 39}    None
    2   rank1   marks_in_test_2 'english': 15   'maths': 30 'science': 45}
    3   rank4   marks_in_test_2 'english': 35   'maths': 31}    None

    #renaming columns
    df.rename(columns = {0: "english", 1: "maths", 2:"science"}, 
              inplace = True)
    
    #removing string from mark columns
    df['english'] = df['english'].str.replace(r'[^0-9] ', '')
    df['science'] = df['science'].str.replace(r'[^0-9] ', '')
    df['maths'] = df['maths'].str.replace(r'[^0-9] ', '')
    df

        rank    variable english    maths   science
    0   rank1   marks_in_test_1 25  30  None
    1   rank4   marks_in_test_1 34  39  None
    2   rank1   marks_in_test_2 15  30  45
    3   rank4   marks_in_test_2 35  31  None