Home > Software engineering >  Pandas: Create new column based on existing, return existing if conditionals don't match
Pandas: Create new column based on existing, return existing if conditionals don't match

Time:06-02

I have a dataset that contains a column with categorical values. I need to standardize the column because some values are coded incorrectly. For example, '1.0' and '3.0' should be '01' and '03', respectively. When the values are correct, however, I just need to return the value of the column I'm cleaning. I'd like to include the cleaned data in a new column.

I am relatively new to Python and Pandas. I usually work in R. I've tried various techniques I found on Stack, but I keep running into an issue when attempting to return the values from the original column if they are correct.

Any assistance would be much appreciated! Here's some sample data:

import pandas as pd
d = {'col1':['01','03','1.0','10.0','7.0','3.0']}
df = pd.DataFrame(data=d)

This returns ....

    col1
0   01
1   03
2   1.0
3   10.0
4   7.0
5   3.0

And I'm hoping to get ...

    col1    col2  
0   01      01
1   03      03
2   1.0     01
3   10.0    10
4   7.0     07
5   3.0     03

CodePudding user response:

You can convert the number column to float then to int and finally add leading zeros.

df['col2'] = (df['col1']
              .astype(float).astype(int)
              .apply('{:0>2}'.format))

df['col3'] = (df['col1']
              .astype(float).astype(int).astype(str)
              .str.zfill(2))
print(df)

   col1 col2 col3
0    01   01   01
1    03   03   03
2   1.0   01   01
3  10.0   10   10
4   7.0   07   07
5   3.0   03   03

CodePudding user response:

This is the style format approach where you individually style each column.

Code:

df['col2'] = df['col1']
df = df.astype(float)
df = df.style.format({'col1': "{:.1f}",'col2': "{:,.0f}"})
df

Output:

    col1    col2
 0  1.0      1
 1  3.0      3
 2  1.0      1
 3  10.0    10
 4  7.0      7
 5  3.0      3
  • Related