How to select values from one column in function of the values of multiple other columns-CodePudding

Here is the original data:

     Name       Wine      Year
0    Mark     Volnay      1983
1    Mark     Volnay      1979
3    Mary     Volnay      1979
4    Mary     Volnay      1999
5    Mary  Champagne      1993
6    Mary  Champagne      1989

I would like to be able to get the value of Year in function of the values of Name and Wine. It would return all the values in the Year column of the entries that have the corresponding values in the Name and Wine columns.

For example: with the key ['Mark', 'Volnay'] I would get the values [1983, 1979]

I tried manipulating the data and here is the best I could get.

Keep one instance of each key:

     Name       Wine      Year
1    Jean     Volnay      1979
4  Pierre     Volnay      1999
6  Pierre  Champagne      1989

Remove the Year column

     Name       Wine
1    Jean     Volnay
4  Pierre     Volnay
6  Pierre  Champagne

Get the values in a list

[['Mark', 'Volnay'], ['Mary', 'Volnay'], ['Mary', 'Champagne']]

I now have the keys I need, but I can't get the values in the original dataframe in function of the value of the key.

CodePudding user response：

You could use set_index and then loc:

key = ['Mark', 'Volnay']
lst = df.set_index(['Name', 'Wine']).loc[key, 'Year'].tolist()

Output:

>>> lst
[1983, 1979]

CodePudding user response：

You can also use groupby with get_group

def getyear(datafrae,keys:list):
    values = df.groupby(['Name', 'Wine']).get_group(tuple(key))['Year']
    dedupvalues = [*dict.fromkeys(values).keys()] #incase of duplicates
    return dedupvalues

keys = ['Mark', 'Volnay']
print(getyear(df,keys))
[1983, 1979]