I have the following dataframe
import pandas as pd
movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [0, 100, 0, 120,0]}
df = pd.DataFrame(movies)
I also have the following dictionary which contains the median runtime of a movie for a given genre
genre_dict = {'Action': 107.0, 'Adventure': 108.0, 'Animation': 86.0, 'Comedy': 99.0, 'Crime': 111.0, 'Drama': 111.0, 'Family': 92.0, 'Fantasy': 103.0, 'History': 124.0, 'Horror': 94.5, 'Music': 105.0, 'Mystery': 100.5, 'Romance': 104.0, 'Science Fiction': 106.0, 'TV Movie': 92.0, 'Thriller': 102.0, 'War': 118.0, 'Western': 119.0}
I would like to replace the runtime values which equal 0 with the median based on the genre which would result in the following dataframe
movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [107, 100, 111, 120,86]}
df = pd.DataFrame(movies)
I tried using a map with the following code
df['runtime'] = df['genre'].map(genre_dict)
However, this replaced the runtime of every movie as there is no condition to state only replace if the current runtime is 0. Any help on how to include the condition would be appreciated, thank you.
CodePudding user response:
Replace 0 with None then fillna
movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [0, 100, 0, 120,0]}
genre_dict = {'Action': 107.0, 'Adventure': 108.0, 'Animation': 86.0, 'Comedy': 99.0, 'Crime': 111.0, 'Drama': 111.0, 'Family': 92.0, 'Fantasy': 103.0, 'History': 124.0, 'Horror': 94.5, 'Music': 105.0, 'Mystery': 100.5, 'Romance': 104.0, 'Science Fiction': 106.0, 'TV Movie': 92.0, 'Thriller': 102.0, 'War': 118.0, 'Western': 119.0}
df = pd.DataFrame(movies).replace({0: None})
df["runtime"] = df["runtime"].fillna(df["genre"].map(genre_dict))
print(df)
name genre runtime
0 Movie A Action 107.0
1 Movie B Crime 100.0
2 Movie C Drama 111.0
3 Movie D Comedy 120.0
4 Movie E Animation 86.0
CodePudding user response:
Use Series.where
and check Series.ne
. All non-zero values will be untouched, all zeros will be filled based on the mapping:
df['runtime'] = df['runtime'].where(df['runtime'].ne(0),
df['genre'].map(genre_dict))
print(df)
name genre runtime
0 Movie A Action 107
1 Movie B Crime 100
2 Movie C Drama 111
3 Movie D Comedy 120
4 Movie E Animation 86