Home > database >  Pandas select match multiple columns
Pandas select match multiple columns

Time:11-09

I have data like this:

category = ['Car','Car','Car','Car','Truck','Truck','Truck']
name = ['Camry','Camry','Camry','Camry','Tacoma','Tundra','Tundra']
year = ['2007','2007','2008','2009','2010','2010','2011']
vals = [0.1,0.5,0.2,0.9,0.8,0.4,0.9]
df = pd.DataFrame({'Category': category,
                   'Name': name,
                   'Year': year,
                   'Vals': vals})
index Category Name Year Vals
0 Car Camry 2007 0.1
1 Car Camry 2007 0.5
2 Car Camry 2008 0.2
3 Car Camry 2009 0.9
4 Truck Tacoma 2010 0.8
5 Truck Tundra 2010 0.4
6 Truck Tundra 2011 0.9

I then have a set of combinations of (Category, Name, Year) that I want to filter the data frame for. They could be in whatever format, but here they are in a data frame.

combinations_i_want = pd.DataFrame()
# (Car, Camry, 2007)
combinations_i_want = combinations_i_want.append({'Category':'Car', 'Name':'Camry','Year':'2007'},ignore_index=True) # 2 matches in df
# (Truck, Tundra, 2010)
combinations_i_want = combinations_i_want.append({'Category':'Truck', 'Name':'Tundra','Year':'2010'},ignore_index=True) # 1 match in df

I want to extract the rows in df that exactly match these two combinations. Those would be rows 0, 1, and 5. The resulting table would look like this:

index Category Name Year Vals
0 Car Camry 2007 0.1
1 Car Camry 2007 0.5
5 Truck Tundra 2010 0.4

Note: I don't need to old indices, they are just for help visualizing.

How do I do this?

CodePudding user response:

You can simply right join on the columns you want.

result = df.merge(combinations_i_want, how='right', on=['Category', 'Name', 'Year'])

CodePudding user response:

You should use .loc and .isin instead of .append

Your sentence could be something like:

df.loc[(df['Category'].isin(['Car', 'Truck'])) & (df['Name'].isin(['Camry', 'Tundra'])) & (df['Year'].isin(['2007', '2010']))]

That should yield the results you're expecting.

You can assign it to a variable if you want such as

combinations_i_want = df.loc[(df['Category'].isin(['Car', 'Truck'])) &
         (df['Name'].isin(['Camry', 'Tundra'])) &
         (df['Year'].isin(['2007', '2010']))]
    
print(combinations_i_want)

CodePudding user response:

use dataframe query it will give you a perfect match based on boolean logic

print(df.query("(Category=='Car' and Name=='Camry' and Year=='2007') or (Category=='Truck' and Name=='Tundra' and Year=='2010')"))

output:

     Category    Name  Year  Vals
   0      Car   Camry  2007   0.1
   1      Car   Camry  2007   0.5
   5    Truck  Tundra  2010   0.4
  • Related