Fill empty cell with what's on the next row-CodePudding

I've scraped a PDF table and it came with an annoying formatting feature.

The table has two columns. In some cases, one row stayed with what should be the column A value and the next stayed with what should be the column B value. Like this:

df = pd.DataFrame()
df['names'] = ['John','Mary',np.nan,'George']
df['numbers'] = ['1',np.nan,'2','3']

I want to reformat that database so wherever there is an empty cell on df['numbers'] it fills it with the value of the next line. Then I apply .dropna() to eliminate the still-wrong cells.

I thied this:

for i in range(len(df)):
  if df['numbers'][i] == np.nan:
    df['numbers'][i] = df['numbers'][i 1]

No change on the dataframe, though. No error message, too.

What am I missing?

CodePudding user response：

While I don't think this solves all your problems, the reason why you are not updating the dataframe is the line if df['numbers'][i] == np.nan: , since this always evaluates to False.

To implement a vlaid test for nan in this case you must use if pd.isnull(df['numbres'][i]): this will evaluate to True or False depending on the cell contents.

CodePudding user response：

This is the solution I found:

df[['numbers']] = df[['numbers']].fillna(method='bfill')
df = df[~df['names'].isna()]

It's probably not the most elegant, but it worked.