Reshape 1d CSV to 2D CSV using pandas-CodePudding

I have a csv formated like:

movie_id	ratings
1	[{'customer_id': 1, "star": "5"}, {'customer_id': 3, "star": "4"}, {'customer_id': 2, "star": "3"}]
2	[{'customer_id': 1, "star": "4"}, {'customer_id': 2, "star": "3"}, {'customer_id': 3, "star": "5"}]

which i want to transform to:

	customer_id_1	customer_id_2	customer_id_3
movie_id_1	5	4	3
movie_id_2	4	3	5

How it can be done by using python pandas ?

CodePudding user response：

Explode the dataframe on ratings and apply series on the dictionary column. An elongated row format is better for manipulation than the one you have specified.

df = df.explode('ratings')
df[['customer_id', 'star']] = df.ratings.apply(pd.Series)

Output

   movie_id                          ratings  customer_id star
0         1  {'customer_id': 1, 'star': '5'}            1    5
0         1  {'customer_id': 3, 'star': '4'}            3    4
0         1  {'customer_id': 2, 'star': '3'}            2    3
1         2  {'customer_id': 1, 'star': '4'}            1    4
1         2  {'customer_id': 2, 'star': '3'}            2    3
1         2  {'customer_id': 3, 'star': '5'}            3    5

If you still want your format, follow the code below

df.drop(columns=['ratings']).set_index(['movie_id', 'customer_id']).unstack(['customer_id'])

Output (Multi-index Dataframe)

            star      
customer_id    1  2  3
movie_id              
1              5  3  4
2              4  3  5

If there's missing star value for a customer and/or a movie. Using fillna will help.

df.drop(columns=['ratings']).set_index(['movie_id', 'customer_id']).unstack(['customer_id']).fillna(0)