I have a CSV file with a list of items that I want to categorize based on a different category rather than color. The CSV file is:
ITEM PRICE QUANTITY CATEGORY
0 Carrot 5 10 Orange
1 Potato 3 5 brown
2 Beans 2 6 red
3 Pizza 2 7 red
4 Salad 3 1 green
5 Burger 1 4 brown
6 Carrot 0 0 orange
7 Carrot 0 0 orange
8 Potato 0 0 brown
9 Beans 0 0 red
The code I wrote is:
import pandas as pd
path = 'C:\\Users\\[username]\\.spyder-py3\\TestFileCSV.csv
df = pd.read_csv(path)
if df.loc[index, 'ITEM'] == 'Carrot':
df.loc[index, 'CATEGORY'] == 'VEGETABLE'
elif df.loc[index, 'ITEM'] == 'Beans':
df.loc[index, 'CATEGORY'] == 'Legumes'
else:
df.loc[index, 'CATEGORY'] == 'Check'
df.to_csv('TestFileCSV1.csv')
The result is I get a new file, TestFileCSV1, with the same exact dataframe as the original TestFileCSV. None of the categories are to vegetables or legumes.
Thank you!
EDIT: To clarify, I want to iterate over the list rather than assign the category one at a time. The actual dataset I want to categorize is thousands of items at a time. Thank you again!
CodePudding user response:
First of all, you're using == for assignment instead of =.
You can use this to assign CATEGORY based on ITEM value:
df.loc[df['ITEM'] == 'Carrot', 'CATEGORY'] = 'VEGETABLE'
CodePudding user response:
Try this:
df['CATEGORY']= (df['ITEM'].apply(lambda x: 'VEGETABLE' if x=='Carrot'
else( 'Legumes' if x=='Beans' else 'Check')))
df
index | ITEM | PRICE | QUANTITY | CATEGORY |
---|---|---|---|---|
0 | Carrot | 5 | 10 | VEGETABLE |
1 | Potato | 3 | 5 | check |
2 | Beans | 2 | 6 | Legumes |
3 | Pizza | 2 | 7 | check |
4 | Salad | 3 | 1 | check |
5 | Burger | 1 | 4 | check |
6 | Carrot | 0 | 0 | VEGETABLE |
7 | Carrot | 0 | 0 | VEGETABLE |
8 | Potato | 0 | 0 | check |
9 | Beans | 0 | 0 | Legumes |
CodePudding user response:
This is a bit cleaner, I think
# here we just map the values to the categories
def map(item):
mapping = {'Carrot':"Vegetable",'Beans':"Legumes"}
return map.get(item,"Check")
# apply, applies the map function to each value in x
df['CATEGORY'] = df.apply(lambda x: map(item = x['YEAR']), axis=1)