Home > Software engineering >  Only format df rows that are not NaN
Only format df rows that are not NaN

Time:10-20

I have the following df:

df = pd.DataFrame({'A': [0.0137, 0.1987, 'Decimal('NaN')', 0.7653]})

Output:

        A
0  0.0137
1  0.1987
2     NaN
3  0.7653

I am trying to format each row from column A, using .iloc (because I have many columns in my actual code) into, e.g. 1.37%.

However, If I perform

df.iloc[:, 0] = (df.iloc[:, 0] * 100).astype(float).map('{:,.2f}%'.format)

All the NaN rows receive a trailing '%', yielding 'NaN%'

So if I try:

df.iloc[:, 0] = df.iloc[:, 0].apply(
        lambda x: (x * 100).astype(float).map('{:,.2f}%'.format) if not istinstance(x, Decimal)
        else None)

I get IndexError: single positional indexer is out-of-bounds.

How can I properly format every row of my df that is not a Decimal(NaN)?

Note: I'm specifically using df.iloc before the equal sign because I only want to inplace change those columns.

CodePudding user response:

use df.loc to choose not NA rows and apply the logic you already have built

# your DF definition has 'NaN' as string, to converting it to np.nan
df.replace('NaN', np.nan, inplace=True)

# Select rows where the value for A is notna() and
# apply formatting

df.loc[df['A'].notna(), 'A']=(df.iloc[:, 0] * 100).astype(float).map('{:,.2f}%'.format)
df
    A
0   1.37%
1   19.87%
2   NaN
3   76.53%

CodePudding user response:

Try this:

df.loc[~df['A'].isna(), 'A'] = (df.loc[~df['A'].isna(), 'A'] * 100).apply('{:,.2f}%'.format)

But careful, you are using NaN value as a string. I recommend to use numpy value. This should be:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [0.0137, 0.1987, np.nan, 0.7653]})
df.loc[~df['A'].isna(), 'A'] = (df.loc[~df['A'].isna(), 'A'] * 100).apply('{:,.2f}%'.format)
  • Related