I have a pandas dataframe where three columns are floats (floats64):
Num1 Num2 Num3
0 2345656 3.0 12345.0
1 3456 3.0 3100.0
2 541304 4.0 5432.0
3 NaN NaN NaN
4 12313201308 1.0 99999.0
I want to add leading zeroes to Num2 and Num3 columns to make them look like this:
Num1 Num2 Num3
0 2345656 003 12345
1 3456 003 03100
2 541304 004 05432
3 NaN NaN NaN
4 12313201308 001 99999
I want Num2 to have 3 digits in total including the leading zeroes and Num3 to have 5 digits in total including the leading zeroes, leaving NaNs as they are (or not impacting NaNs). The end goal is to concatenate Num1, Num2, and Num3 to create a new column.
I tried both
df['Num2'].apply(lambda x: '{:05}'.format(x) if pd.notnull(x) else x)
and
df['Num2'].apply(lambda x: x.zfill(5) if pd.notnull(x) else x)
, but they did not add zeroes as I expected. I would appreciate if someone more knowledgable than I am can help me out!
CodePudding user response:
You have the numbers as floats, convert them to int first:
df['Num2'] = df['Num2'].apply(lambda x: '{:03}'.format(int(x)) if pd.notnull(x) else x)
df['Num3'] = df['Num3'].apply(lambda x: '{:05}'.format(int(x)) if pd.notnull(x) else x)
df
Num1 Num2 Num3
0 2.345656e 06 003 12345
1 3.456000e 03 003 03100
2 5.413040e 05 004 05432
3 NaN NaN NaN
4 1.231320e 10 001 99999
CodePudding user response:
If you have only NaN values in your column, you could try :
import numpy as np
df['Num3'] = df['Num3'].apply(lambda x: '{0:0>5}'.format(int(x)) if not np.isnan(x) else np.nan)
CodePudding user response:
We can take advantage of index alignment, and select only the rows from Num3
which are notnull
, convert to int
using astype
then map
to a format string:
df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].astype(int).map('{:05d}'.format)
Or directly to Series.map
and call python's int
:
df['Num3'] = df.loc[df['Num3'].notna(), 'Num3'].map(lambda x: f'{int(x):05d}')
df
:
Num1 Num2 Num3
0 2345656.0 3.0 12345
1 3456.0 3.0 03100
2 541304.0 4.0 05432
3 NaN NaN NaN
4 12313201308.0 1.0 99999
Setup:
import pandas as pd
from numpy import nan
pd.set_option("precision", 10)
df = pd.DataFrame({
'Num1': [2345656.0, 3456.0, 541304.0, nan, 12313201308.0],
'Num2': [3.0, 3.0, 4.0, nan, 1.0],
'Num3': [12345.0, 3100.0, 5432.0, nan, 99999.0]
})