Home > database >  Replacing Pandas Dataframe Column with Dictionary Values Based on a Condition
Replacing Pandas Dataframe Column with Dictionary Values Based on a Condition

Time:10-31

I have the following dataframe

import pandas as pd

movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [0, 100, 0, 120,0]}

df = pd.DataFrame(movies)

I also have the following dictionary which contains the median runtime of a movie for a given genre

genre_dict = {'Action': 107.0, 'Adventure': 108.0, 'Animation': 86.0, 'Comedy': 99.0, 'Crime': 111.0, 'Drama': 111.0, 'Family': 92.0, 'Fantasy': 103.0, 'History': 124.0, 'Horror': 94.5, 'Music': 105.0, 'Mystery': 100.5, 'Romance': 104.0, 'Science Fiction': 106.0, 'TV Movie': 92.0, 'Thriller': 102.0, 'War': 118.0, 'Western': 119.0}

I would like to replace the runtime values which equal 0 with the median based on the genre which would result in the following dataframe

movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [107, 100, 111, 120,86]}

df = pd.DataFrame(movies)

I tried using a map with the following code

df['runtime'] = df['genre'].map(genre_dict)

However, this replaced the runtime of every movie as there is no condition to state only replace if the current runtime is 0. Any help on how to include the condition would be appreciated, thank you.

CodePudding user response:

Replace 0 with None then fillna

movies = {'name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'], 'genre' : ['Action', 'Crime', 'Drama', 'Comedy', 'Animation'], 'runtime' : [0, 100, 0, 120,0]}
genre_dict = {'Action': 107.0, 'Adventure': 108.0, 'Animation': 86.0, 'Comedy': 99.0, 'Crime': 111.0, 'Drama': 111.0, 'Family': 92.0, 'Fantasy': 103.0, 'History': 124.0, 'Horror': 94.5, 'Music': 105.0, 'Mystery': 100.5, 'Romance': 104.0, 'Science Fiction': 106.0, 'TV Movie': 92.0, 'Thriller': 102.0, 'War': 118.0, 'Western': 119.0}

df = pd.DataFrame(movies).replace({0: None})
df["runtime"] = df["runtime"].fillna(df["genre"].map(genre_dict))

print(df)

      name      genre  runtime
0  Movie A     Action    107.0
1  Movie B      Crime    100.0
2  Movie C      Drama    111.0
3  Movie D     Comedy    120.0
4  Movie E  Animation     86.0

CodePudding user response:

Use Series.where and check Series.ne. All non-zero values will be untouched, all zeros will be filled based on the mapping:

df['runtime'] = df['runtime'].where(df['runtime'].ne(0),
                                    df['genre'].map(genre_dict))

print(df)

      name      genre  runtime
0  Movie A     Action      107
1  Movie B      Crime      100
2  Movie C      Drama      111
3  Movie D     Comedy      120
4  Movie E  Animation       86
  • Related