I have an intresting situtation my dataframe looks something like this.
marks_in_test_1 marks_in_test_2
rank1 {'english': 25, 'maths': 30} {'english': 15, 'maths': 30, 'science': 45}
rank4 {'english': 34, 'maths': 39} {'english': 35, 'maths': 31}
I want to convert it into something as follows which looks like a pivot table in which old columns values also act as indexes.
english maths Science
rank1 marks_in_test_1 25 30 NaN
rank1 marks_in_test_2 15 30 45
rank2 marks_in_test_1 34 35 NaN
rank2 marks_in_test_2 39 31 NaN
I have tried to look into pandas pivot docs but nothing helpful was there.
CodePudding user response:
Use DataFrame.stack
with DataFrame
constructor:
s = df.stack()
df = pd.DataFrame(s.tolist(), index=s.index)
print (df)
english maths
rank1 marks_in_test_1 25 30
marks_in_test_2 15 30
rank4 marks_in_test_1 34 39
marks_in_test_2 35 31
s = df.stack()
df = pd.DataFrame(s.tolist(), index=s.index).rename_axis(['a','b']).reset_index()
print (df)
a b english maths
0 rank1 marks_in_test_1 25 30
1 rank1 marks_in_test_2 15 30
2 rank4 marks_in_test_1 34 39
3 rank4 marks_in_test_2 35 31
CodePudding user response:
You can convert your dicts by applying pd.Series
on each of them (even if it's not really optimized)
>>> df.stack().apply(pd.Series).rename_axis(index=['rank', 'marks']).reset_index()
rank marks english maths science
0 rank1 marks_in_test_1 25.0 30.0 NaN
1 rank1 marks_in_test_2 15.0 30.0 45.0
2 rank4 marks_in_test_1 34.0 39.0 NaN
3 rank4 marks_in_test_2 35.0 31.0 NaN
CodePudding user response:
#transposing column
df = df.melt(id_vars=["rank"])
df.head()
rank variable value
0 rank1 marks_in_test_1 {'english': 25, 'maths': 30}
1 rank4 marks_in_test_1 {'english': 34, 'maths': 39}
2 rank1 marks_in_test_2 {'english': 15, 'maths': 30, 'science': 45}
3 rank4 marks_in_test_2 {'english': 35, 'maths': 31}
df = pd.concat([df[['rank','variable']], df['value'].astype(str).str.replace("{","").replace("}","").str.split(', ', expand=True)], axis=1)
df.head()
rank variable 0 1 2
0 rank1 marks_in_test_1 'english': 25 'maths': 30} None
1 rank4 marks_in_test_1 'english': 34 'maths': 39} None
2 rank1 marks_in_test_2 'english': 15 'maths': 30 'science': 45}
3 rank4 marks_in_test_2 'english': 35 'maths': 31} None
#renaming columns
df.rename(columns = {0: "english", 1: "maths", 2:"science"},
inplace = True)
#removing string from mark columns
df['english'] = df['english'].str.replace(r'[^0-9] ', '')
df['science'] = df['science'].str.replace(r'[^0-9] ', '')
df['maths'] = df['maths'].str.replace(r'[^0-9] ', '')
df
rank variable english maths science
0 rank1 marks_in_test_1 25 30 None
1 rank4 marks_in_test_1 34 39 None
2 rank1 marks_in_test_2 15 30 45
3 rank4 marks_in_test_2 35 31 None