How to link hierarchical data in pandas from excel?-CodePudding

I have this excel sheet with hierarchies like this: excel snippet

Item Category Price
**Electronics** 1 
Laptop 1 1000
Kindle 1 200
Mobile 1 500
**HouseItems** 2
VacuumCleaner 2 200
Clock 2 50

How could I get the items by category? For example get the electronics like laptop and their prices and in a separate list get the house items. In the excel sheet I have more categories, this is just a snippet.


df = pd.read_excel('items.xlsx',
                   ['itemSheet'], engine='openpyxl')
df['items'] = pd.Series()
item_list= ['Electronics', 'HouseItems']
for item in df['itemSheet']['Item']:
    if item in cost_entry_group:
        df['items'].add(item)

print(df['items'])

How could I link the itemCategory(electronics) to the laptop, kindle and mobile and to their respective prices and do the same for the houseitems?

CodePudding user response：

Isn't the category already in your df?

use df[df['Category'] == 1]to get items where the category equals 1 aka 'HouseItems'

You could also do something like:

categories = {'Electronics': 0, 'HouseItems': 1}
dfs  = {}
for category_name, category_number in categories.items():
     dfs[category_name] = df[df['Category'] == category_number]

to get multiple DataFrames containing only one category.

To extract the categories from the DataFrame you could check for a 'nan' value in the price:

categories = {}
for index, row in df.iterrows():
    if row.isnull().values.any():
        categories[row['Item']] = row['Category']

This would be my quick and dirty solution. Alternatively you could go through the excel sheet with openpyxl and check for bold text:

from openpyxl import load_workbook

wb = load_workbook(path, data_only=True)
sh = wb[wb.sheetnames[0]]
categories = {}
for i in range(sh.min_row, sh.max_row): # go through rows
    if sh['A{}'.format(i)].font.b == True: # font.b gives True if bold otherwise False
        name = sh['A{}'.format(i)].value
        number = sh['B{}'.format(i)].value
        categories[name] = number

Probably there are 'better' solutions, but it works.