Home > Blockchain >  How to link hierarchical data in pandas from excel?
How to link hierarchical data in pandas from excel?

Time:10-15

I have this excel sheet with hierarchies like this: excel snippet

Item Category Price
**Electronics** 1 
Laptop 1 1000
Kindle 1 200
Mobile 1 500
**HouseItems** 2
VacuumCleaner 2 200
Clock 2 50

How could I get the items by category? For example get the electronics like laptop and their prices and in a separate list get the house items. In the excel sheet I have more categories, this is just a snippet.


df = pd.read_excel('items.xlsx',
                   ['itemSheet'], engine='openpyxl')
df['items'] = pd.Series()
item_list= ['Electronics', 'HouseItems']
for item in df['itemSheet']['Item']:
    if item in cost_entry_group:
        df['items'].add(item)

print(df['items'])

How could I link the itemCategory(electronics) to the laptop, kindle and mobile and to their respective prices and do the same for the houseitems?

CodePudding user response:

Isn't the category already in your df?

use df[df['Category'] == 1]to get items where the category equals 1 aka 'HouseItems'

You could also do something like:

categories = {'Electronics': 0, 'HouseItems': 1}
dfs  = {}
for category_name, category_number in categories.items():
     dfs[category_name] = df[df['Category'] == category_number]

to get multiple DataFrames containing only one category.

To extract the categories from the DataFrame you could check for a 'nan' value in the price:

categories = {}
for index, row in df.iterrows():
    if row.isnull().values.any():
        categories[row['Item']] = row['Category']

This would be my quick and dirty solution. Alternatively you could go through the excel sheet with openpyxl and check for bold text:

from openpyxl import load_workbook

wb = load_workbook(path, data_only=True)
sh = wb[wb.sheetnames[0]]
categories = {}
for i in range(sh.min_row, sh.max_row): # go through rows
    if sh['A{}'.format(i)].font.b == True: # font.b gives True if bold otherwise False
        name = sh['A{}'.format(i)].value
        number = sh['B{}'.format(i)].value
        categories[name] = number

Probably there are 'better' solutions, but it works.

  • Related