Home > Net >  Pandas ffill() to fill missing data
Pandas ffill() to fill missing data

Time:11-03

I am currently trying to fill blanks in a data frame that looks like the following:

       AL|ATFC|Year Latitude Longitude
0      AL011851      NaN       NaN
1           NaN    28.0N     94.8W
2           NaN    28.0N     95.4W
3           NaN    28.0N     96.0W
4           NaN    28.1N     96.5W
5           NaN    28.2N     96.8W
6           NaN    28.2N     97.0W
7           NaN    28.3N     97.6W
8           NaN    28.4N     98.3W
9           NaN    28.6N     98.9W
10          NaN    29.0N     99.4W
11          NaN    29.5N     99.8W
12          NaN    30.0N    100.0W
13          NaN    30.5N    100.1W
14          NaN    31.0N    100.2W
15     AL021851      NaN       NaN
16          NaN    22.2N     97.6W
17     AL031851      NaN       NaN
18          NaN    12.0N     60.0W

I have been trying the following line of code with the goal to fill the AL|ATFC|Year column where I have NaN values with the pandas ffill() function.

df.where(df['AL|ATFC|Year'] == float('NaN'), df['AL|ATFC|Year'].ffill(), axis=1, inplace=True)

To get the following dataframe:

    AL|ATFC|Year Latitude Longitude
0      AL011851      NaN       NaN
1      AL011851    28.0N     94.8W
2      AL011851    28.0N     95.4W
3      AL011851    28.0N     96.0W
4      AL011851    28.1N     96.5W
5      AL011851    28.2N     96.8W
6      AL011851    28.2N     97.0W
7      AL011851    28.3N     97.6W
8      AL011851    28.4N     98.3W
9      AL011851    28.6N     98.9W
10     AL011851    29.0N     99.4W
11     AL011851    29.5N     99.8W
12     AL011851    30.0N    100.0W
13     AL011851    30.5N    100.1W
14     AL011851    31.0N    100.2W
15     AL021851      NaN       NaN
16     AL021851    22.2N     97.6W
17     AL031851      NaN       NaN
18     AL031851    12.0N     60.0W

Thereafter, I am planning the drop row with missing Lon/Lat values. However, the code I have been trying to use does not work to fill in the missing values in the AL|ATFC|Year column and I don't understand why...Any help would be much appreciated!

Thanks

CodePudding user response:

You could replace the 'AL|ATFC|Year' NaN by np.nan, and then do use fillna function. I reproduced only the first 3 rows:


import pandas as pd

data = {'AL|ATFC|Year' : ['AL011851', 'NaN', 'NaN'],
        'Latitude': ['NaN', '28.0N', '28.0N'],
        'Longitude': ['NaN', '94.8W', '95.4W']}

df = pd.DataFrame(data)

df['AL|ATFC|Year'].replace('NaN', np.nan, inplace=True)
df['AL|ATFC|Year'].fillna(method='ffill', inplace=True)

outputs:

  AL|ATFC|Year Latitude Longitude
0     AL011851      NaN       NaN
1     AL011851    28.0N     94.8W
2     AL011851    28.0N     95.4W

CodePudding user response:

ffill function is fill forward the value "where it is NA/NaN value", so you do not need NaN condition in ffill.

df['AL|ATFC|Year'].ffill(inplace=True)
  • Related