Home > Software engineering >  Pandas DataFrame Not Dropping Rows based on String Value
Pandas DataFrame Not Dropping Rows based on String Value

Time:10-04

I am having some trouble with filtering a dataset based on string values. I have tried multiple methods and none of them seem to work. I have data which looks like the following:

enter image description here

Some of the "CountryNames" in this dataset are "Unknown", like the following:

enter image description here

I would like to filter out the rows with the "Unknown" value in CountryNames. I have tried mulitple methods and none of them seem to work for some odd reason. They just produce the exact same dataset as before.

Here's a snippet of my code:

data = pd.read_excel(r"C:\Users\DylanNdengu\Downloads\combined_table.xlsx", index_col=False)

located_data = data[~data["CountryNames"].isin(["Unknown"])]

data and located data have the exact same shape, and the rows with Unknown are still there. Please also note I have also tried the following commands:

located_data = data[~data["CountryNames"].isin(["Unknown"])==True]
located_data = data[data["CountryNames"].isin(["Unknown"])==False]
located_data = data[data["CountryNames"]!="Unknown"]

All of these are not working either. Please tell me what I am doing wrong and how to fix this. The dtype for the CountryNames column is "object" if that helps.

CodePudding user response:

From your image examples it looks like the values in your data are actually misspelled as Uknown rather than the correct Unknown. Otherwise your code seems correct.

Try this:

located_data = data[~data["CountryNames"].isin(["Uknown"])]

Alternatively you should fix your data by renaming the mispelled names with the correct ones, for example by using:

data[data["CountryNames"]=="Uknown"] = "Unknown"
  • Related