I have a dataset that contains a column with categorical values. I need to standardize the column because some values are coded incorrectly. For example, '1.0' and '3.0' should be '01' and '03', respectively. When the values are correct, however, I just need to return the value of the column I'm cleaning. I'd like to include the cleaned data in a new column.
I am relatively new to Python and Pandas. I usually work in R. I've tried various techniques I found on Stack, but I keep running into an issue when attempting to return the values from the original column if they are correct.
Any assistance would be much appreciated! Here's some sample data:
import pandas as pd
d = {'col1':['01','03','1.0','10.0','7.0','3.0']}
df = pd.DataFrame(data=d)
This returns ....
col1
0 01
1 03
2 1.0
3 10.0
4 7.0
5 3.0
And I'm hoping to get ...
col1 col2
0 01 01
1 03 03
2 1.0 01
3 10.0 10
4 7.0 07
5 3.0 03
CodePudding user response:
You can convert the number column to float then to int and finally add leading zeros.
df['col2'] = (df['col1']
.astype(float).astype(int)
.apply('{:0>2}'.format))
df['col3'] = (df['col1']
.astype(float).astype(int).astype(str)
.str.zfill(2))
print(df)
col1 col2 col3
0 01 01 01
1 03 03 03
2 1.0 01 01
3 10.0 10 10
4 7.0 07 07
5 3.0 03 03
CodePudding user response:
This is the style format approach where you individually style each column.
Code:
df['col2'] = df['col1']
df = df.astype(float)
df = df.style.format({'col1': "{:.1f}",'col2': "{:,.0f}"})
df
Output:
col1 col2
0 1.0 1
1 3.0 3
2 1.0 1
3 10.0 10
4 7.0 7
5 3.0 3