Home > Mobile >  Pandas/ Python list values of one column based on string value of another column
Pandas/ Python list values of one column based on string value of another column

Time:07-30

I have a .csv like this, with these columns:

Receipt ID Name Quantity Category Type
135135 Croissant 1.0 Food
135135 Cappucino 1.0 Drink
143143 Salad 1.0 Food
154134 Americano 1.0 Drink
178781 Cappucino 1.0 Drink
169071 Muffin 1.0 Food
169071 Latte 1.0 Drink
169071 Brownie 1.0 Food

I want to get the 'Receipt ID's where the 'Category Type' is 'Food'

I have tried a few methods but none of them works

df1 = df.query('Category Type == Food')['Receipt ID'].unique()

Does not work

I also tried setting Category Type as index

df1 = df.set_index('Category Type').eq('Food')

print (df1.index[df1['Receipt ID']].tolist())

Which gave me an empty list

The Receipt IDs are not necessarily unique, although I want the outputs to be unique, and the final goal is to find the Receipt ID that contains both food and drink. Could any expert please give me some help please? Thank you!

CodePudding user response:

df.where(df['Category Type'] == 'Food')['Receipt ID'].dropna().values.tolist()

if you want unique:

df.where(df['Category Type'] == 'Food')['Receipt ID'].dropna().unique().astype(int).tolist()

or

df.loc[df['Category Type'] == 'Food', 'Receipt ID'].unique().tolist()

for all types:

df.groupby('Category Type').agg({'Receipt ID': 'unique'}).to_dict()

CodePudding user response:


import pandas as pd
from io import StringIO

data_str = """
Receipt ID  Name    Quantity    Category Type
135135  Croissant   1.0 Food
135135  Cappucino   1.0 Drink
143143  Salad   1.0 Food
154134  Americano   1.0 Drink
178781  Cappucino   1.0 Drink
169071  Muffin  1.0 Food
169071  Latte   1.0 Drink
169071  Brownie 1.0 Food
"""
# This is myself organizing the data, you can skip it here
io_str = StringIO(data_str)
df = pd.read_csv(io_str, header=0, sep='\t')

# start here
# method 1 
filter_df = df[df['Category Type'] == 'Food']
unique_list = filter_df['Receipt ID'].unique().tolist()
print(unique_list)

# method 2 use loc function
unique_list=df.loc[df['Category Type'] == 'Food', 'Receipt ID'].unique().tolist()
print(unique_list)
"""
[135135, 143143, 169071]
"""


  • Related