I need to trim the 'My Pet' column inside the query string. How can that be done?
# Note that one cat literal has a trailing space.
testDF = pd.DataFrame([{"My Pet":"cat ", "Cost":"$10 ", "Weight":"10 pounds", "Name":"Violet"},
{"My Pet":"cat", "Cost":"$10 ", "Weight":"15 pounds", "Name":"Sirius"},
{"My Pet":"dog", "Cost":"$0 ", "Weight":"50 pounds", "Name":"Sam"},
{"My Pet":"turtle", "Cost":"$5 ", "Weight":"20 ounces", "Name":"Tommy"},
])
# We try to filter on cat.
catDF = testDF.query("`My Pet` == 'cat'") # This yields only one row because one cat cell has a trailing space
catDF.head()
Output is only one row but I would like to get both rows with cat in them
My Pet Cost Weight Name
1 cat $10 15 pounds Sirius
CodePudding user response:
Use Series.str.strip
, but need engine="python"
:
catDF = testDF.query("`My Pet`.str.strip() == 'cat'", engine="python")
print (catDF)
My Pet Cost Weight Name
0 cat $10 10 pounds Violet
1 cat $10 15 pounds Sirius
CodePudding user response:
You can access string methods on string series. This means that you can use the Series.str.strip
function to clean remove whitespace.
testDF["My Pet"] = testDF["My Pet"].str.strip()
If you want to only clean trailing spaces then use rstrip
(and lstrip
for leading spaces)
You can optionally specify the set of characters you want to remove explicitly. E.g. to remove only spaces and tabs:
testDF["My Pet"] = testDF["My Pet"].str.strip(" \t")