I have .xlsx file which looks like this:
ID Column1 Column2 Column3 ...
123 Free BLUE XX
333 NA GREEN X
445 BUSY BLUE XX
665 FREE BLUE XXX
332 NA RED X
297 FREE BLUE XXXX
... ... ... ...
So I have to make a python script that will load this file and parse it and give me all ID's which for example have Column1 FREE. Have found out that I can use libraries like xlrd, pandas, Openpyxl etc.. but still cannot achive what I need.
My current try with xlrd is like this:
file_location = 'location'
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_name('wanted_sheet')
IDs = []
col1 = []
for id in sheet.col_values(0):
IDs.append(id)
for state in sheet.col_values(1):
if state == 'FREE':
col1.append(state)
Now need to connect somehow this state with corresponding ID... What would be best approach to do this?
CodePudding user response:
import pandas as pd
df = pd.read_excel(
io = "R:/x.xlsx" ,
# sheet_name = 0 , # 1st sheet ,
sheet_name = "Sheet1" ,
)
df[ ( df["Column1"]=="Free" ) | ( df["Column1"]=="FREE" ) ]
Adjust your filepath and sheetname as required.
CodePudding user response:
Option 1
Use pandas
. You already have an answer by @wenyongzhou.
Option 2
If for any reasons you have to do without pandas
, just use openpyxl
or another library and read the rows in a dict
. In order to have some filtering options we may define a little function:
wb=load_workbook("yourfile.xlsx")
ws=wb.active
def filter_rows(ws, filtercolumn, filtervalue):
headers = [h.value for h in next(ws.rows)]
filterindex = headers.index(filtercolumn)
return {r[0].value : r[filterindex].value for r in ws.rows if r[filterindex].value == filtervalue}
filter_rows(ws,"Column1","FREE")
{665: 'FREE', 297: 'FREE'}
filter_rows(ws,"Column2","BLUE")
{123: 'BLUE', 445: 'BLUE', 665: 'BLUE', 297: 'BLUE'}