Home > Software engineering >  Data cleaning in Pandas
Data cleaning in Pandas

Time:02-02

I have an age column which has values such as 10 <9 or >45. I have to clean this data and make it ready for EDA. What sort of logic I can use to clean the data. enter image description here

CodePudding user response:

Hope, it will work for your solution, use str.extract to get only integers from a string,

import pandas as pd
import re
df = pd.DataFrame(
    data=
    [
        {'emp_length': '10 years'},
        {'emp_length': '3 years'},
        {'emp_length': '<1 year'}
    ]
                 )
df['emp_length'] = df['emp_length'].str.extract(r'(\d )')
df
  • Related