I have been trying to convert a pandas Dataframe column to a list as the data in the column is being read as a str by default. Sample data in the dataframe 'movie' column 'genres' is
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
The code I am writing
import ast
import pandas as pd
movie = pd.read_csv("tmdb_5000_movies.csv")
movie['genres'] = movie['genres'].apply(lambda x : ast.literal_eval(str(x)))
print(type(movie['genres']))
The output I am getting is
<class 'pandas.core.series.Series'>
Really can't wrap my head around where am I going wrong
CodePudding user response:
pandas.DataFrame
s are composed of Series
objects (where a Series
is simply a column. Series are container objects similar to Python lists and can actually be converted into a list
by using their Series.tolist
method.
ast.literal_eval
is being applied on each element inside of your Series, converting them a string
into dictionary
, those dictionaries as then stored back into a Series
.
So pretty much your code is working- but if you want a list
of dictionaries instead of a Series
of dictionaries, you'll need to the following:
import ast
import pandas as pd
movie = pd.read_csv("tmdb_5000_movies.csv")
movie['genres'] = movie['genres'].apply(lambda x : ast.literal_eval(str(x)))
genres = movie['genres'].tolist()
print(genres)