I have a .csv like this, with these columns:
Receipt ID | Name | Quantity | Category Type |
---|---|---|---|
135135 | Croissant | 1.0 | Food |
135135 | Cappucino | 1.0 | Drink |
143143 | Salad | 1.0 | Food |
154134 | Americano | 1.0 | Drink |
178781 | Cappucino | 1.0 | Drink |
169071 | Muffin | 1.0 | Food |
169071 | Latte | 1.0 | Drink |
169071 | Brownie | 1.0 | Food |
I want to get the 'Receipt ID's where the 'Category Type' is 'Food'
I have tried a few methods but none of them works
df1 = df.query('Category Type == Food')['Receipt ID'].unique()
Does not work
I also tried setting Category Type as index
df1 = df.set_index('Category Type').eq('Food')
print (df1.index[df1['Receipt ID']].tolist())
Which gave me an empty list
The Receipt IDs are not necessarily unique, although I want the outputs to be unique, and the final goal is to find the Receipt ID that contains both food and drink. Could any expert please give me some help please? Thank you!
CodePudding user response:
df.where(df['Category Type'] == 'Food')['Receipt ID'].dropna().values.tolist()
if you want unique:
df.where(df['Category Type'] == 'Food')['Receipt ID'].dropna().unique().astype(int).tolist()
or
df.loc[df['Category Type'] == 'Food', 'Receipt ID'].unique().tolist()
for all types:
df.groupby('Category Type').agg({'Receipt ID': 'unique'}).to_dict()
CodePudding user response:
import pandas as pd
from io import StringIO
data_str = """
Receipt ID Name Quantity Category Type
135135 Croissant 1.0 Food
135135 Cappucino 1.0 Drink
143143 Salad 1.0 Food
154134 Americano 1.0 Drink
178781 Cappucino 1.0 Drink
169071 Muffin 1.0 Food
169071 Latte 1.0 Drink
169071 Brownie 1.0 Food
"""
# This is myself organizing the data, you can skip it here
io_str = StringIO(data_str)
df = pd.read_csv(io_str, header=0, sep='\t')
# start here
# method 1
filter_df = df[df['Category Type'] == 'Food']
unique_list = filter_df['Receipt ID'].unique().tolist()
print(unique_list)
# method 2 use loc function
unique_list=df.loc[df['Category Type'] == 'Food', 'Receipt ID'].unique().tolist()
print(unique_list)
"""
[135135, 143143, 169071]
"""