41-45 93
46-50 81
36-40 73
51-55 71
26-30 67
21-25 62
31-35 61
56-70 29
56-60 26
61 or older 23
15-20 10
Name: age, dtype: int64
pd.to_numeric(combined['age'], errors='coerce')
i used this above code to convert my dataframe column to numeric but all it does it convert it all to NaN values
Here is my output
3 NaN
5 NaN
8 NaN
9 NaN
11 NaN
..
696 NaN
697 NaN
698 NaN
699 NaN
701 NaN
Name: age, Length: 651, dtype: float64
CodePudding user response:
try the below:
import pandas as pd
df = pd.DataFrame({"age": ["41-45", "46-50","61 or older"], "Col2": [93, 81, 23]})
Cols = ["Lower_End_Age", "Higher_End_Age",] # list of column names for later
# replacing whitespace by delimiter and splitting only once `n=1` using the same delimiter
df[Cols] = df["age"].str.replace(' ', '-').str.split("-", n=1, expand = True)
print(df)
age Col2 Lower_End_Age Higher_End_Age
0 41-45 93 41 45
1 46-50 81 46 50
2 61 or older 23 61 or-older
later:
df['Lower_End_Age'] = pd.to_numeric(df['Lower_End_Age'], errors='coerce')
df.dtypes
age object
Col2 int64
Lower_End_Age int64
Higher_End_Age object
and if you want to get rid of or-older
, simply repeat
df['Higher_End_Age'] = pd.to_numeric(df['Higher_End_Age'], errors='coerce')
print(df)
age Col2 Lower_End_Age Higher_End_Age
0 41-45 93 41 45.0
1 46-50 81 46 50.0
2 61 or older 23 61 NaN