Home > Blockchain >  Iterating excel sheets with python
Iterating excel sheets with python

Time:08-06

I have a big excel (137MB), where I have 50 consumers and each of them have a separate excel sheet. Trying not to create 50 variables I would like to know how I can take from each sheet a specific column. For example, I want to save in a different variable the values in row 3. Or if it is easier save them in a variable and then taking into account the number of rows dividing it into 50 variables.

CodePudding user response:

you can read the entire dataframe without specifying the sheet which will give you a dictionary dataframe.

you can then use a list comprehension and loop.

MCVE

import pandas as pd 

df1 = pd.get_dummies(list('ABCD'))

df2 = df1
df2['B'] = 5

writer = pd.ExcelWriter('workbook.xlsx')
df1  = pd.to_excel(writer,sheet_name='test1')
df2 = pd.to_excel(writer,sheet_name='test2')
writer.save()

print(df1)

   A  B  C  D
0  1  0  0  0
1  0  1  0  0
2  0  0  1  0
3  0  0  0  1

print(df2)

   A  B  C  D
0  1  5  0  0
1  0  5  0  0
2  0  5  1  0
3  0  5  0  1

df = pd.read_excel('workbook.xlsx',sheet_name=None)

print(df)

{'test1':    Unnamed: 0  A  B  C  D
0           0  1  5  0  0
1           1  0  5  0  0
2           2  0  5  1  0
3           3  0  5  0  1, 'test2':    Unnamed: 0  A  B  C  D
0           0  1  5  0  0
1           1  0  5  0  0
2           2  0  5  1  0
3           3  0  5  0  1}


new = pd.concat([frame[['A','C']] for frame in df.values()])

print(new)

   A  C
0  1  0
1  0  0
2  0  1
3  0  0
0  1  0
1  0  0
2  0  1
3  0  0

CodePudding user response:

Someone recommended Pandas, but I have used openpyxl to do similar things.

First you can import openpyxl and setup the worksheet like this:

import openpyxl #openpyxl will need to be installed via pip
workbook = openpyxl.load_workbook(PATHTOYOURWORKSHEET)

to access specific sheets of the workbook, as you mentioned each consumer has a seperate sheet, you do this:

worksheet1 = workbook["THE NAME OF YOUR SHEET"]
#you can call each whatever you want, this is how you get into a specific sheet.

Then to select a specific column you could do something like this although this is for a row, not a column, but if you read openpyxl docs i think its just iter_col instead, which just iterates every single cell in the row under your parameters:

sheet1Row2 = worksheet1.iter_rows(min_row=2, max_row=2, min_col=1, max_col=sheet1.max_column, values_only=True)

then using this, you could do something like for each 'sheet' in the workbook, target the row you want and then do whatever you want with it, add it to an array, save each to a new variable, etc.

Hope it helps

  • Related