Home > Blockchain >  Reshape 1d CSV to 2D CSV using pandas
Reshape 1d CSV to 2D CSV using pandas

Time:12-13

I have a csv formated like:

movie_id ratings
1 [{'customer_id': 1, "star": "5"}, {'customer_id': 3, "star": "4"}, {'customer_id': 2, "star": "3"}]
2 [{'customer_id': 1, "star": "4"}, {'customer_id': 2, "star": "3"}, {'customer_id': 3, "star": "5"}]

which i want to transform to:

customer_id_1 customer_id_2 customer_id_3
movie_id_1 5 4 3
movie_id_2 4 3 5

How it can be done by using python pandas ?

CodePudding user response:

Explode the dataframe on ratings and apply series on the dictionary column. An elongated row format is better for manipulation than the one you have specified.

df = df.explode('ratings')
df[['customer_id', 'star']] = df.ratings.apply(pd.Series)

Output

   movie_id                          ratings  customer_id star
0         1  {'customer_id': 1, 'star': '5'}            1    5
0         1  {'customer_id': 3, 'star': '4'}            3    4
0         1  {'customer_id': 2, 'star': '3'}            2    3
1         2  {'customer_id': 1, 'star': '4'}            1    4
1         2  {'customer_id': 2, 'star': '3'}            2    3
1         2  {'customer_id': 3, 'star': '5'}            3    5

If you still want your format, follow the code below

df.drop(columns=['ratings']).set_index(['movie_id', 'customer_id']).unstack(['customer_id'])

Output (Multi-index Dataframe)

            star      
customer_id    1  2  3
movie_id              
1              5  3  4
2              4  3  5

If there's missing star value for a customer and/or a movie. Using fillna will help.

df.drop(columns=['ratings']).set_index(['movie_id', 'customer_id']).unstack(['customer_id']).fillna(0)
  • Related