dataframe in python and how to write-CodePudding

I have a pandas DataFrame with two columns: toy and color. The color column includes missing values.

How do I fill the missing color values with the most frequent color for that particular toy?

Here's the code to create a sample dataset:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'toy':['car'] * 4   ['train'] * 5   ['ball'] * 3   ['truck'],
    'color':['red', 'blue', 'blue', nan, 'green', nan,
             'red', 'red', np, 'blue', 'red', nan, 'green']
    })

CodePudding user response：

instead on nan and np you have to use np.nan

>>> df = pd.DataFrame({
'toy':['car'] * 4   ['train'] * 5   ['ball'] * 3   ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
         'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
>>> df.color = df.color.fillna(method='mode')
    toy color
0   car red
1   car blue
2   car blue
3   car mode
4   train   green
5   train   mode
6   train   red
7   train   red
8   train   mode
9   ball    blue
10  ball    red
11  ball    mode
12  truck   green

CodePudding user response：

To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table.