I've a data frame which contains one column. Below is the example
Questionsbysortorder
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3
Q1-1,Q2-2,Q3-1,Q4-1
Q1-5,Q2-3,Q3-3
I'm trying to explode the columns with the help of already given row values. Like below is the example
Questionsbysortorder Q1 Q2 Q3 Q4 Q5
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3 4 3 2 3 3
Q1-1,Q2-2,Q3-1,Q4-1 1 2 1 1 NA
Q1-5,Q2-3,Q5-3 5 3 NA NA 3
Below is the code which i tried, but it's returning an error
import pandas as pd
import numpy as np
df = pd.DataFrame({'Questionsbysortorder': ['Q1-4,Q2-3,Q3-2,Q4-3,Q5-3', 'Q1-1,Q2-2,Q3-1,Q4-1','Q1-5,Q2-3,Q5-3']})
df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split(',')
df = df.explode('Questionsbysortorder')
df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split('-')
df = df.explode('Questionsbysortorder')
df = df.set_index('Questionsbysortorder').unstack().reset_index()
df.columns = ['Questionsbysortorder', 'value']
df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')
df.columns.name = None
print(df)
Error is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-30dd8b8d4f59> in <module>()
14 df = df.set_index('Questionsbysortorder').unstack().reset_index()
15
---> 16 df.columns = ['Questionsbysortorder', 'value']
17
18 df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
56 elif new_len != old_len:
57 raise ValueError(
---> 58 f"Length mismatch: Expected axis has {old_len} elements, new "
59 f"values have {new_len} elements"
60 )
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements
Can anyone please help me with this?
CodePudding user response:
You are very close. You want to
- split by
','
, - explode the list,
- split again by
'-'
to get the different fields - finally pivot the data
In code:
df.join(df.Questionsbysortorder.str.split(',')
.explode()
.str.split('-', expand=True)
.set_index(0, append=True)[1]
.unstack()
)
Output:
Questionsbysortorder Q1 Q2 Q3 Q4 Q5
0 Q1-4,Q2-3,Q3-2,Q4-3,Q5-3 4 3 2 3 3
1 Q1-1,Q2-2,Q3-1,Q4-1 1 2 1 1 NaN
2 Q1-5,Q2-3,Q3-3 5 3 3 NaN NaN