I have this excel sheet with hierarchies like this: excel snippet
Item Category Price
**Electronics** 1
Laptop 1 1000
Kindle 1 200
Mobile 1 500
**HouseItems** 2
VacuumCleaner 2 200
Clock 2 50
How could I get the items by category? For example get the electronics like laptop and their prices and in a separate list get the house items. In the excel sheet I have more categories, this is just a snippet.
df = pd.read_excel('items.xlsx',
['itemSheet'], engine='openpyxl')
df['items'] = pd.Series()
item_list= ['Electronics', 'HouseItems']
for item in df['itemSheet']['Item']:
if item in cost_entry_group:
df['items'].add(item)
print(df['items'])
How could I link the itemCategory(electronics) to the laptop, kindle and mobile and to their respective prices and do the same for the houseitems?
CodePudding user response:
Isn't the category already in your df?
use
df[df['Category'] == 1]
to get items where the category equals 1 aka 'HouseItems'
You could also do something like:
categories = {'Electronics': 0, 'HouseItems': 1}
dfs = {}
for category_name, category_number in categories.items():
dfs[category_name] = df[df['Category'] == category_number]
to get multiple DataFrames containing only one category.
To extract the categories from the DataFrame you could check for a 'nan' value in the price:
categories = {}
for index, row in df.iterrows():
if row.isnull().values.any():
categories[row['Item']] = row['Category']
This would be my quick and dirty solution. Alternatively you could go through the excel sheet with openpyxl and check for bold text:
from openpyxl import load_workbook
wb = load_workbook(path, data_only=True)
sh = wb[wb.sheetnames[0]]
categories = {}
for i in range(sh.min_row, sh.max_row): # go through rows
if sh['A{}'.format(i)].font.b == True: # font.b gives True if bold otherwise False
name = sh['A{}'.format(i)].value
number = sh['B{}'.format(i)].value
categories[name] = number
Probably there are 'better' solutions, but it works.