I am running a process that checks to see if a row value is numeric. If it is not numeric, a column called 'Flag' is updated with the keyword 'Error'
I can do this in SQL using the ISNUMERIC function. I attempted to do this with pandasql's sqldf function but received the error message no such function: ISNUMERIC
This is a sample df:
import pandas as pd
data = {'P1': [0, 'ABC', 1.1]
}
originaldf = pd.DataFrame(data)
This is my desired output
P1 | Flag
---|------
0 |
ABC| Error
1.1|
This is what is throwing the no such function: ISNUMERIC error
import pandasql as ps
newdf= ps.sqldf('''
SELECT *,
CASE
WHEN
ISNUMERIC(P1) < 0 THEN 'ERROR'
END
AS `Flag`
FROM originaldf
''')
After researching this I decided to move on from trying to use the ISNUMERIC function in this ps.sqldf query and use this instead. However, while this did not throw an error, it did not work either.
import pandasql as ps
newdf= ps.sqldf('''
SELECT *,
CASE
WHEN
P1 LIKE '%[A-Za-z]%' THEN 'ERROR'
END
AS `Flag`
FROM originaldf
''')
#newdf
P1 | Flag
---|------
0 |
ABC|
1.1|
After this, I also tried using df.iterrows() I did this by doing this:
for idx, row in originaldf.iterrows():
if str(row['P1']) == '%[A-Za-z]%':
row['Flag'] = 'Error'
This did not return an error but sadly did not work as well. I suspect that it is because I am not using this '%[A-Za-z]%' correctly.
Any tips to get going in the right direction are appreciated.
CodePudding user response:
You can apply an alphabetical check as:
originaldf['Flag'] = originaldf.P1.astype(str).apply(str.isalpha).map({True: "Error"})
originaldf
Which will give you:
P1 | Flag | |
---|---|---|
0 | 0 | nan |
1 | ABC | Error |
2 | 1.1 | nan |
CodePudding user response:
You can write your own is_numeric()
function and apply it to the P1
column like below.
data = {'P1': [0, 'ABC', 1.1]}
df = pd.DataFrame(data)
def is_numeric(x):
result = 'error'
if type(x) in (int, float):
result = ''
return result
df['Flag'] = df['P1'].apply(is_numeric)
df.head()