I'm collecting some data from Google Sheets, but I need to convert one column with more than one string into a row. Each cell can have more than one string, and each string I want to put in one row
I have this dataframe
import pandas as pd
df = pd.DataFrame({'Name': ['Bob', 'John', 'Ric'],
'Submitted At': ['2022/08/12 23:56:42', '2022/08/12 23:56:42', '2022/08/12 23:56:42'],
'Class': ['Math,English,History', 'English,History', 'Math, Chemistry']})
print(df)
Name Submitted At Class
0 Bob 2022/08/12 23:56:42 Math,English,History
1 John 2022/08/12 23:56:42 English,History
2 Ric 2022/08/12 23:56:42 Math, Chemistry
I want to transform it into this
I'm trying to do this, but it did not work
df2 = df.join(df['Class'].str.split(',', expand=True).add_prefix('Class'))
print(df2.melt(id_vars=['Submitted At', 'Name'], var_name=df2.iloc[:, 2:6]))
ValueError: Unable to coerce to Series, length must be 4: given 2
Could you help me please?
CodePudding user response:
You can try explode
the split list Class
column
out = (df.assign(Class=df['Class'].str.split(','))
.explode('Class', ignore_index=True))
print(out)
Name Submitted At Class
0 Bob 2022/08/12 23:56:42 Math
1 Bob 2022/08/12 23:56:42 English
2 Bob 2022/08/12 23:56:42 History
3 John 2022/08/12 23:56:42 English
4 John 2022/08/12 23:56:42 History
5 Ric 2022/08/12 23:56:42 Math
6 Ric 2022/08/12 23:56:42 Chemistry