Padding zeroes to columns with NaN values-CodePudding

I have a pandas dataframe where three columns are floats (floats64):

         Num1       Num2         Num3
0       2345656      3.0      12345.0
1          3456      3.0       3100.0
2        541304      4.0       5432.0      
3           NaN      NaN          NaN        
4   12313201308      1.0      99999.0

I want to add leading zeroes to Num2 and Num3 columns to make them look like this:

         Num1       Num2         Num3
0       2345656      003        12345
1          3456      003        03100
2        541304      004        05432      
3           NaN      NaN          NaN        
4   12313201308      001        99999

I want Num2 to have 3 digits in total including the leading zeroes and Num3 to have 5 digits in total including the leading zeroes, leaving NaNs as they are (or not impacting NaNs). The end goal is to concatenate Num1, Num2, and Num3 to create a new column.

I tried both df['Num2'].apply(lambda x: '{:05}'.format(x) if pd.notnull(x) else x) and df['Num2'].apply(lambda x: x.zfill(5) if pd.notnull(x) else x), but they did not add zeroes as I expected. I would appreciate if someone more knowledgable than I am can help me out!

CodePudding user response：

You have the numbers as floats, convert them to int first:

df['Num2'] = df['Num2'].apply(lambda x: '{:03}'.format(int(x)) if pd.notnull(x) else x)
df['Num3'] = df['Num3'].apply(lambda x: '{:05}'.format(int(x)) if pd.notnull(x) else x)

df

           Num1 Num2   Num3
0  2.345656e 06  003  12345
1  3.456000e 03  003  03100
2  5.413040e 05  004  05432
3           NaN  NaN    NaN
4  1.231320e 10  001  99999

CodePudding user response：

If you have only NaN values in your column, you could try :

import numpy as np
df['Num3'] = df['Num3'].apply(lambda x: '{0:0>5}'.format(int(x)) if not np.isnan(x) else np.nan)

CodePudding user response：

We can take advantage of index alignment, and select only the rows from Num3 which are notnull, convert to int using astype then map to a format string:

df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].astype(int).map('{:05d}'.format)

Or directly to Series.map and call python's int:

df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].map(lambda x: f'{int(x):05d}')

df:

            Num1  Num2   Num3
0      2345656.0   3.0  12345
1         3456.0   3.0  03100
2       541304.0   4.0  05432
3            NaN   NaN    NaN
4  12313201308.0   1.0  99999

Setup:

import pandas as pd
from numpy import nan

pd.set_option("precision", 10)

df = pd.DataFrame({
    'Num1': [2345656.0, 3456.0, 541304.0, nan, 12313201308.0],
    'Num2': [3.0, 3.0, 4.0, nan, 1.0],
    'Num3': [12345.0, 3100.0, 5432.0, nan, 99999.0]
})