import numpy as np
import pandas as pd
numbers = ["1", "2", "3", "4", "5", "6", "Missing Value", "7"]
df = pd.DataFrame(numbers, columns=["Numbers"])
new_df = df.query("Numbers > 4")
print(new_df)
When you run the above code, you get the following error message: TypeError: '>' not supported between instances of 'str' and 'int'
. This is because the numbers in the dataframe are all strings, but in the query, we are checking if the number string in the column is greater than the integer 4.
How would you deal with a situation like this given that you are required to use the query method and don't want to change all the numbers to integers. I.e. something like this:
new_df = df.query("int(Numbers) > 4")
which doesn't work by the way. Also, it would need to handle occassional strings that can't be converted to an integer like "Missing Value"
.
CodePudding user response:
We have to_numeric
df['Numbers'] = pd.to_numeric(df['Numbers'], errors = 'coerce')
new_df = df.query("Numbers > 4")
new_df
Out[10]:
Numbers
4 5.0
5 6.0
7 7.0
CodePudding user response:
import numpy as np
import pandas as pd
numbers = ["1", "2", "3", "4", "5", "6", "Missing Value", "7"]
df = pd.DataFrame(numbers, columns=["Numbers"])
df.Numbers = df.Numbers.apply(
lambda n: np.NaN if n == 'Missing Value' else float(n)
)
new_df = df.query("Numbers > 4")
print(new_df)
prints
index | Numbers |
---|---|
4 | 5.0 |
5 | 6.0 |
7 | 7.0 |
CodePudding user response:
You're going to have issues if you keep the column as str
. For example:
>>> '14' > '4'
False
If you really want to, you can specify a string '4'
in your query:
>>> df.query("Numbers > '4'")
Numbers
4 5
5 6
6 Missing Value
7 7
However, I think you should just accept that Missing Value
should be NaN
and follow BENY's answer.
If you're not forced to use query
due to some school requirement or something... there are other ways as well, where you can do the casting without actually applying it to the column:
>>> out = df[pd.to_numeric(df.Numbers, errors='coerce').gt(4)]
>>> out
Numbers
4 5
5 6
7 7
# Your "Numbers" are still strings:
>> out.dtypes
Numbers object
dtype: object