Home > Enterprise >  Using .loc to categorize data and iterate over a dataframe using python to a CSV file
Using .loc to categorize data and iterate over a dataframe using python to a CSV file

Time:05-26

I have a CSV file with a list of items that I want to categorize based on a different category rather than color. The CSV file is:

     ITEM  PRICE  QUANTITY CATEGORY
0  Carrot      5        10   Orange
1  Potato      3         5    brown
2   Beans      2         6      red
3   Pizza      2         7      red
4   Salad      3         1    green
5  Burger      1         4    brown
6  Carrot      0         0   orange
7  Carrot      0         0   orange
8  Potato      0         0    brown
9   Beans      0         0      red

The code I wrote is:

import pandas as pd
path = 'C:\\Users\\[username]\\.spyder-py3\\TestFileCSV.csv

df = pd.read_csv(path)

if df.loc[index, 'ITEM'] == 'Carrot':
    df.loc[index, 'CATEGORY'] == 'VEGETABLE'
elif df.loc[index, 'ITEM'] == 'Beans':
    df.loc[index, 'CATEGORY'] == 'Legumes'
else:
    df.loc[index, 'CATEGORY'] == 'Check'
df.to_csv('TestFileCSV1.csv')

The result is I get a new file, TestFileCSV1, with the same exact dataframe as the original TestFileCSV. None of the categories are to vegetables or legumes.

Thank you!

EDIT: To clarify, I want to iterate over the list rather than assign the category one at a time. The actual dataset I want to categorize is thousands of items at a time. Thank you again!

CodePudding user response:

First of all, you're using == for assignment instead of =.

You can use this to assign CATEGORY based on ITEM value:

df.loc[df['ITEM'] == 'Carrot', 'CATEGORY'] = 'VEGETABLE'

CodePudding user response:

Try this:

df['CATEGORY']= (df['ITEM'].apply(lambda x: 'VEGETABLE' if x=='Carrot' 
                                  else( 'Legumes' if x=='Beans' else 'Check')))

df
index ITEM PRICE QUANTITY CATEGORY
0 Carrot 5 10 VEGETABLE
1 Potato 3 5 check
2 Beans 2 6 Legumes
3 Pizza 2 7 check
4 Salad 3 1 check
5 Burger 1 4 check
6 Carrot 0 0 VEGETABLE
7 Carrot 0 0 VEGETABLE
8 Potato 0 0 check
9 Beans 0 0 Legumes

CodePudding user response:

This is a bit cleaner, I think

# here we just map the values to the categories
def map(item):
  mapping = {'Carrot':"Vegetable",'Beans':"Legumes"}
  return map.get(item,"Check")


# apply, applies the map function to each value in x
df['CATEGORY'] = df.apply(lambda x: map(item = x['YEAR']), axis=1)
  • Related