I have a pandas DataFrame with two columns: toy and color. The color column includes missing values.
How do I fill the missing color values with the most frequent color for that particular toy?
Here's the code to create a sample dataset:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'toy':['car'] * 4 ['train'] * 5 ['ball'] * 3 ['truck'],
'color':['red', 'blue', 'blue', nan, 'green', nan,
'red', 'red', np, 'blue', 'red', nan, 'green']
})
CodePudding user response:
instead on nan and np you have to use np.nan
>>> df = pd.DataFrame({
'toy':['car'] * 4 ['train'] * 5 ['ball'] * 3 ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
>>> df.color = df.color.fillna(method='mode')
toy color
0 car red
1 car blue
2 car blue
3 car mode
4 train green
5 train mode
6 train red
7 train red
8 train mode
9 ball blue
10 ball red
11 ball mode
12 truck green
CodePudding user response:
To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table.