Home > OS >  How to extract values in a specific row range from Excel with Python
How to extract values in a specific row range from Excel with Python

Time:11-05

enter image description here

I want to print out all row values corresponding to a specific month in Excel using Python. Please look at the picture.

If I put "2021.10" in the function value using the function, I want the values [2021.10.03", "2021.10.03", "2021.10.03", "2021.10.03","2021.10.03", None "2021.10.15", "2021.10.15", None] to be entered in the list.

Simply, if I put the value "2021.10" into the function, I would like to extract the values from rows 5 to 13 of column A.

What should I do?


I'm reading the sheet in openpyxl now.

import openpyxl as oxl

load_excel = oxl.load_workbook('C:/Users/Homework.xlsx',data_only = True)
load_sheet = load_excel['Sheet']
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

This can extract rows as what you want

# dataframe from the excel file
             A     B  C
0   2021.09.23     E  1
1   2021.09.23     A  1
2   2021.09.23     E  1
3         None  None  3
4   2021.10.03     A  1
5   2021.10.03     A  2
6   2021.10.03     B  2
7   2021.10.03     E  1
8   2021.10.03     A  1
9         None  None  7
10  2021.10.15     A  2
11  2021.10.15     B  3
12        None  None  5
13  2021.11.03     C  2
14  2021.11.03     B  1
15  2021.11.03     F  2
def extract(df, value):
    df = df.reset_index()  # to make index column
    first_index = df[(df['A'].str.startswith(value))].iloc[0]['index']  # to get index of first 2021.10 value
    last_index = df[(df['A'].str.startswith(value))].iloc[-1]['index']  # to get index of last 2021.10 value

    sub_df = df.iloc[first_index:last_index 1,]  # to make dataframe from first index to last index

    for i, row in df[last_index 1:].iterrows():
        # to add all None right after the last value
        if row['A'] == 'None':
            sub_df = sub_df.append(row)
        else:
            break
    print(sub_df['A'].to_list())


import pandas as pd
df = pd.read_excel('C:/Users/Homework.xlsx')  # I read xlsx file using pandas instead of `openpyxl`
extract(df, '2021.10')

This will be printed by the function:

['2021.10.03', '2021.10.03', '2021.10.03', '2021.10.03', '2021.10.03', 'None', '2021.10.15', '2021.10.15', 'None']

CodePudding user response:

you can declare a list first and then write a loop which will traverse the rows which value you need. After every iteration you can append the value to the list. after completing the loop you will get your desired extracted values.

list = []

for i,j in range of(3,10):
    list.append(sheet.cell(row=i, col=j).value = list[i][j])
  • Related