I have a dataframe:
Region Value ID
Asia N/A XYAZZ
Asia '0' ZSJDS
EMEA '2' KKKKK
USME '0.50' KLJSA
APAC NODATA JKUUS
APAC '1.00' JKUUC
In 'Value' column I have strings that may contain a number as string, N/A or any other string. I need a code that will return me any values that are NOT in this range: 0 < x <= 1, like this:
Region Value ID
Asia N/A XYAZZ
Asia '0' ZSJDS
EMEA '2' KKKKK
APAC NODATA JKUUS
I tried something like this
data_types_dict = {'Value': float}
df = df.astype(data_types_dict, errors='ignore')
df = df[(df['Value'] <= 0) | (df['Value'] > 1)]
but it works only when there are only numbers and string-numbers, it fails when converting NODATA to float. I thought that I could first filter out all rows with strings that are non convertible to float, then perform the above operation, then concat it, like this:
df1 = df1['Value'].StringsNonFloat
df2 = df[(df['Value'] <= 0) | (df['Value'] > 1)]
final_df = concat([df1, df2])
but is this pythonic? is there such a function?
CodePudding user response:
df = df[~(df['Value'].apply(pd.to_numeric, errors='coerce').notnull() & df['Value'].apply(pd.to_numeric, errors='coerce').
between(0, 1, inclusive='right'))]
print(df)
output:
Region Value ID
0 Asia NaN XYAZZ
1 Asia 0 ZSJDS
2 EMEA 2 KKKKK
4 APAC NODATA JKUUS
CodePudding user response:
I believe you may be looking for something like this:
df = df[(df["Value"].astype(float) > 1) | (df["Value"].astype(float) <= 0)]
So taking the data frame where the value as type float is greater than one or less than/equal to 0.
This is very similar to your code but using .astype(float) instead.
Edit (following comments)
I would do the following:
Start by converting value to float if possible
df["Value"] = [float(value) if value.isdigit() else value for value in df["Value"]]
Then subset by only those that are float:
df = df[df["Value"].apply(lambda x: isinstance(x,float))]
Then subset again by those that are in the desired range:
df = df[(df["Value"] > 1) | (df["Value"] <= 0)]