Home > Net >  Padding zeroes to columns with NaN values
Padding zeroes to columns with NaN values

Time:09-24

I have a pandas dataframe where three columns are floats (floats64):

         Num1       Num2         Num3
0       2345656      3.0      12345.0
1          3456      3.0       3100.0
2        541304      4.0       5432.0      
3           NaN      NaN          NaN        
4   12313201308      1.0      99999.0 

I want to add leading zeroes to Num2 and Num3 columns to make them look like this:

         Num1       Num2         Num3
0       2345656      003        12345
1          3456      003        03100
2        541304      004        05432      
3           NaN      NaN          NaN        
4   12313201308      001        99999 

I want Num2 to have 3 digits in total including the leading zeroes and Num3 to have 5 digits in total including the leading zeroes, leaving NaNs as they are (or not impacting NaNs). The end goal is to concatenate Num1, Num2, and Num3 to create a new column.

I tried both df['Num2'].apply(lambda x: '{:05}'.format(x) if pd.notnull(x) else x) and df['Num2'].apply(lambda x: x.zfill(5) if pd.notnull(x) else x), but they did not add zeroes as I expected. I would appreciate if someone more knowledgable than I am can help me out!

CodePudding user response:

You have the numbers as floats, convert them to int first:

df['Num2'] = df['Num2'].apply(lambda x: '{:03}'.format(int(x)) if pd.notnull(x) else x)
df['Num3'] = df['Num3'].apply(lambda x: '{:05}'.format(int(x)) if pd.notnull(x) else x)

df

           Num1 Num2   Num3
0  2.345656e 06  003  12345
1  3.456000e 03  003  03100
2  5.413040e 05  004  05432
3           NaN  NaN    NaN
4  1.231320e 10  001  99999

CodePudding user response:

If you have only NaN values in your column, you could try :

import numpy as np
df['Num3'] = df['Num3'].apply(lambda x: '{0:0>5}'.format(int(x)) if not np.isnan(x) else np.nan)                                                              

CodePudding user response:

We can take advantage of index alignment, and select only the rows from Num3 which are notnull, convert to int using astype then map to a format string:

df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].astype(int).map('{:05d}'.format)

Or directly to Series.map and call python's int:

df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].map(lambda x: f'{int(x):05d}')

df:

            Num1  Num2   Num3
0      2345656.0   3.0  12345
1         3456.0   3.0  03100
2       541304.0   4.0  05432
3            NaN   NaN    NaN
4  12313201308.0   1.0  99999

Setup:

import pandas as pd
from numpy import nan

pd.set_option("precision", 10)

df = pd.DataFrame({
    'Num1': [2345656.0, 3456.0, 541304.0, nan, 12313201308.0],
    'Num2': [3.0, 3.0, 4.0, nan, 1.0],
    'Num3': [12345.0, 3100.0, 5432.0, nan, 99999.0]
})
  • Related