Home > Blockchain >  Filter variable not working properly in Pandas
Filter variable not working properly in Pandas

Time:10-29

I have a very large dataset, and I am applying multiple filters on many columns. In order to make the code more readable, I assign the filters to some variables - but I noticed that although the values in the dataframe have changed, the filter seems like doesn't take into account the new values.

This is my dataframe:

data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}

df = pd.DataFrame(data)

And I want to apply this filter:

`df['weight'] = df['weight'].astype(str)
filter1 = (df['weight'].str.slice(stop=2)=='65') & (df['country'].str.slice(stop=2)=='Au')`

Initially, I get what I want using the filter:

df.loc[filter1]

Later, I change the filtered rows as follows:

df.loc[filter1,'weight'] = '100'

And when I use again the filter I expect no result, but instead it returns me the same rows, although the value of the filter should be False

CodePudding user response:

filter1 doesn't magically update to match values that you set after it is created... make it again after your changes and you'll see that it works as expected:

def get_filter1(df):
    return df['weight'].str[:2].eq('65') & df['country'].str[:2].eq('Au')


print(df.loc[get_filter1(df)])

df.loc[get_filter1(df), 'weight'] = '100'

print(df.loc[get_filter1(df)])

Output:

   id    age   height weight education employment     country        city
1  84  22-22  165-167  65-70     11-13       8-11  Austria-EU  Vienna-AUS

Empty DataFrame
Columns: [id, age, height, weight, education, employment, country, city]
Index: []
  • Related