Home > Enterprise >  Parse .xlsx with Python and gather statistic of contents from rows and columns
Parse .xlsx with Python and gather statistic of contents from rows and columns

Time:10-02

I have .xlsx file which looks like this:

  ID      Column1    Column2    Column3   ...

  123      Free       BLUE       XX
  333       NA        GREEN      X
  445      BUSY       BLUE       XX
  665      FREE       BLUE       XXX
  332       NA        RED        X
  297      FREE       BLUE       XXXX 
  ...      ...        ...        ...

So I have to make a python script that will load this file and parse it and give me all ID's which for example have Column1 FREE. Have found out that I can use libraries like xlrd, pandas, Openpyxl etc.. but still cannot achive what I need.

My current try with xlrd is like this:

  file_location = 'location'
    workbook = xlrd.open_workbook(file_location)
    
    sheet = workbook.sheet_by_name('wanted_sheet')
    
    IDs = []
    col1 = []
    for id in sheet.col_values(0):
        IDs.append(id)
    
    for state in sheet.col_values(1):
       if state == 'FREE':
         col1.append(state)

Now need to connect somehow this state with corresponding ID... What would be best approach to do this?

CodePudding user response:

import pandas as pd

df = pd.read_excel(
    io = "R:/x.xlsx" ,
    # sheet_name = 0 , # 1st sheet ,
    sheet_name = "Sheet1" ,
    )

df[ ( df["Column1"]=="Free" ) | ( df["Column1"]=="FREE" ) ]

Adjust your filepath and sheetname as required.

CodePudding user response:

Option 1

Use pandas. You already have an answer by @wenyongzhou.

Option 2

If for any reasons you have to do without pandas, just use openpyxl or another library and read the rows in a dict. In order to have some filtering options we may define a little function:

wb=load_workbook("yourfile.xlsx")
ws=wb.active

def filter_rows(ws, filtercolumn, filtervalue):
    headers = [h.value for h in next(ws.rows)]
    filterindex = headers.index(filtercolumn)
    return {r[0].value : r[filterindex].value for r in ws.rows if r[filterindex].value == filtervalue}

filter_rows(ws,"Column1","FREE")
{665: 'FREE', 297: 'FREE'}

filter_rows(ws,"Column2","BLUE")
{123: 'BLUE', 445: 'BLUE', 665: 'BLUE', 297: 'BLUE'}
  • Related