I have an age column which has values such as 10 <9 or >45. I have to clean this data and make it ready for EDA. What sort of logic I can use to clean the data.
CodePudding user response:
Hope, it will work for your solution, use str.extract to get only integers from a string,
import pandas as pd
import re
df = pd.DataFrame(
data=
[
{'emp_length': '10 years'},
{'emp_length': '3 years'},
{'emp_length': '<1 year'}
]
)
df['emp_length'] = df['emp_length'].str.extract(r'(\d )')
df