I have a very large dataset, and I am applying multiple filters on many columns. In order to make the code more readable, I assign the filters to some variables - but I noticed that although the values in the dataframe have changed, the filter seems like doesn't take into account the new values.
This is my dataframe:
data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}
df = pd.DataFrame(data)
And I want to apply this filter:
`df['weight'] = df['weight'].astype(str)
filter1 = (df['weight'].str.slice(stop=2)=='65') & (df['country'].str.slice(stop=2)=='Au')`
Initially, I get what I want using the filter:
df.loc[filter1]
Later, I change the filtered rows as follows:
df.loc[filter1,'weight'] = '100'
And when I use again the filter I expect no result, but instead it returns me the same rows, although the value of the filter should be False
CodePudding user response:
filter1
doesn't magically update to match values that you set after it is created... make it again after your changes and you'll see that it works as expected:
def get_filter1(df):
return df['weight'].str[:2].eq('65') & df['country'].str[:2].eq('Au')
print(df.loc[get_filter1(df)])
df.loc[get_filter1(df), 'weight'] = '100'
print(df.loc[get_filter1(df)])
Output:
id age height weight education employment country city
1 84 22-22 165-167 65-70 11-13 8-11 Austria-EU Vienna-AUS
Empty DataFrame
Columns: [id, age, height, weight, education, employment, country, city]
Index: []