Suppose I have a dataframe x
that has a column terms
. Terms are supposed to be of type string, but some contain numbers and for this reason I want to delete the rows in the dataframe where the corresponding terms
values are integers/floats. I tried the following but received a Key Error
:
x = x.drop(x[type(x['terms']) is int].index)
How should I change the code?
CodePudding user response:
Use pd.to_numeric
:
df = pd.DataFrame({'terms': [13, 0.23, 'hello', 'world', '12', '0.45']})
df = df[pd.to_numeric(df['terms'], errors='coerce').isna()]
print(df)
# Output:
terms
2 hello
3 world
Details:
>>> df
terms
0 13
1 0.23
2 hello
3 world
4 12
5 0.45
>>> pd.to_numeric(df['terms'], errors='coerce')
0 13.00
1 0.23
2 NaN
3 NaN
4 12.00
5 0.45
Name: terms, dtype: float64
CodePudding user response:
Say you have a dataframe like this:
df = pd.DataFrame({'a':[1,'sd','sf',2,5,'13','s','143f','d234f','z24']})
# notice 13 is a string here ^^^^
a
0 1
1 sd
2 sf
3 2
4 5
5 13
6 s
7 143f
8 d234f
9 z24
If you want to get rid of items that look like numbers but are actually strings, use this:
df = df[~df['a'].astype(str).str.isdigit()]
Output:
>>> df
a
1 sd
2 sf
6 s
7 143f
8 d234f
9 z24
If you want to get rid of items that are actually not strings at all, use this:
df = df[df['a'].transform(type).eq(str)]
Output:
>>> df
a
1 sd
2 sf
5 13 <--- Notice how the string '13' is kept
6 s
7 143f
8 d234f
9 z24