I am having some trouble with filtering a dataset based on string values. I have tried multiple methods and none of them seem to work. I have data which looks like the following:
Some of the "CountryNames" in this dataset are "Unknown", like the following:
I would like to filter out the rows with the "Unknown" value in CountryNames. I have tried mulitple methods and none of them seem to work for some odd reason. They just produce the exact same dataset as before.
Here's a snippet of my code:
data = pd.read_excel(r"C:\Users\DylanNdengu\Downloads\combined_table.xlsx", index_col=False)
located_data = data[~data["CountryNames"].isin(["Unknown"])]
data and located data have the exact same shape, and the rows with Unknown are still there. Please also note I have also tried the following commands:
located_data = data[~data["CountryNames"].isin(["Unknown"])==True]
located_data = data[data["CountryNames"].isin(["Unknown"])==False]
located_data = data[data["CountryNames"]!="Unknown"]
All of these are not working either. Please tell me what I am doing wrong and how to fix this. The dtype for the CountryNames column is "object" if that helps.
CodePudding user response:
From your image examples it looks like the values in your data are actually misspelled as Uknown
rather than the correct Unknown
. Otherwise your code seems correct.
Try this:
located_data = data[~data["CountryNames"].isin(["Uknown"])]
Alternatively you should fix your data by renaming the mispelled names with the correct ones, for example by using:
data[data["CountryNames"]=="Uknown"] = "Unknown"