Python - openpyxl - Use openpyxl to get number of rows that contain a specific value-CodePudding

I'm newer to Python. I'm using openpyxl for a SEO project for my brother and I'm trying to get a number of rows that contain a specific value in them.

I have a spreadsheet that looks something like this:

I want to write a program that will get the keywords and parse them to a string by state, so like: Missouri = "search item 1, search item 2, search item 5, search item 6" Illinois = "search item 3, search item 4"

I have thus far created a program like this:

    #first, import openpyxl
    import openpyxl

    #next, give location of file
    path = "testExcel.xlsx"

    #Open workbook by creating object
    wb_object = openpyxl.load_workbook(path)

    #Get workbook active sheet object
    sheet_object = wb_object.active

    #Getting the value of maximum rows
    #and column
    row = sheet_object.max_row
    column = sheet_object.max_column
    print("Total Rows:", row)
    print("Total Columns:", column)

    #printing the value of forth column, state
    #Loop will print all values
    #of first column
    print("\nValue of fourth column")
    for i in range(4, row   1):
        cell_object = sheet_object.cell(row=i, column=4)
        split_item_test = cell_object.value.split(",")
        split_item_test_result = split_item_test[0]
        state = split_item_test_result
        print(state)
        if (state == 'Missouri'):
            print(state.count('Missouri'))
    print("All good")

The problem is after doing this, I see that it prints 1 repeatedly, but not a total number for Missouri. I would like a total number of mentions of the state, and then eventually get it to a string with each search criteria.

Is this possible with openpyxl? Or will I need a different library?

CodePudding user response：

ranemirusG is right, there are several ways to obtain the same result. Here's another option...I attempted to preserve your thought process, good luck.

print("\nValue of fourth column")

missouri_list = [] # empty list
illinois_list = [] # empty list

for i in range(2, row 1): # It didn't look like "4, row 1" captured the full sheet, try (2, row 1)
    cell_object = sheet_object.cell(row=i, column=4)
    keyword = sheet_object.cell(row=i, column=1)
    keyword_fmt = keyword.value # Captures values in Keyword column
    split_item_test = cell_object.value.split(",")
    split_item_test_result = split_item_test[1] # 1 captures states
    state = split_item_test_result
    print(state)

    # simple if statement to capture results in a list
    if 'Missouri' in state:
        missouri_list.append(keyword_fmt)
    if 'Illinois' in state:
        illinois_list.append(keyword_fmt)
print(missouri_list)
print(len(missouri_list)) # Counts the number of occurances
print(illinois_list)
print(len(illinois_list)) # Counts the number of occurances
print("All good")

CodePudding user response：

Yes, it's possible to do it with openpyxl. To achieve your real goal try something like this:

states_and_keywords  = {}
for i in range(4, row   1):
    cell_object = sheet_object.cell(row=i, column=4)
    split_item_test = cell_object.value.split(",")
    split_item_test_result = split_item_test[1] #note that the element should be 1 for the state
    state = split_item_test_result.strip(" ") #trim whitespace (after comma)
    keyword = cell_object.offset(0,-3).value #this gets the value of the keyword for that row
    if state not in states_and_keywords:
        states_and_keywords[state] = [keyword]
    else:
        states_and_keywords[state].append(keyword) 
print(states_and_keywords)