Home > Software engineering >  Concatenate excel sheets using Pandas
Concatenate excel sheets using Pandas

Time:09-18

enter image description hereMy first time using pandas. I am sure the answer is something along the lines of storing the worksheet names in a list, then looping through the list based on the name I am looking for. I'm just not experienced enough to know how to do that.

The goal is to use pandas to extract and concatenate data from multiple worksheets from a user selected workbook. The final output being a single worksheet excel containing all data extracted from the various worksheets.

The excel workbook consist of approximately 100 worksheets. The qty of visible sheets will always vary, with the qty of sheets occurring before 'Main Frames BUP1' being variable as well.

I currently have the portion of code checking for page visibility working. I can not seem to figure out how to start at a specific worksheet when that worksheets position in the workbook could vary (i.e. not always the 3rd worksheet starting from 0 it could be the 5th in a users excel). It will however, always be the sheet that data should start being pulled from. Everything I find are examples of specifying specific sheets to read.

Any help/direction would be appreciated.

# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)

xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets

for x in sheets:
    print(x.title, x.sheet_state)
    if x.sheet_state == 'visible':
        df = pd.concat(pd.read_excel(xls, sheet_name=None, header=None,
             skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()

*******Additional clarification on final goal: Exclude all sheets occurring before 'Main Frames BUP 1', only consider visible sheets, pull data from 'M6:AD37', if entire row is blank do not add(or at least remove) from data frame, stop pulling data at the sheet just before a worksheet who's name has a partial match to 'panel'

If I create a dictionary of visible sheets, how do you create a new dictionary useing that dictionary only consisting of 'Main Frames BUP 1' to whatever sheet occurs just before a partial match of 'panel'? Then I can use that dictionary for my data pull.

CodePudding user response:

It is a bit hard without actually see what is going on with your data. I believe that what you are missing is that you need to create one dataframe first and after concat the others. Also you need to pass a sheet(x) in order to pandas be able to read it, otherwise it will become a dictionary. In case it does not work, get the first sheet and create a df, then you concat.

# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)


xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets


df = pd.DataFrame()
for x in sheets:
    print(x.title, x.sheet_state)
    if x.sheet_state == 'visible':
        df = pd.concat(pd.read_excel(xls, sheet_name=x, header=None,
             skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()

You can also put all the dfs in a dictionary, again it is difficult without knowing what you are working with.

xl = pd.ExcelFile('yourFile.xlsx')

#collect all sheet names
sheets = xl.sheet_names

#build dictionaries from all sheets passing None to sheet_name
diDF = pd.read_excel('yourFile.xlsx', sheet_name=None)

di = {k : diDF[k] for k in diDF if k in sheets}


for x in sheets:
    if x.sheet_state == 'visible':
        dfs = {x: pd.DataFrame(di[x])}
        

CodePudding user response:

I created a minimal sample myself and worked it out for you.

xls = pd.ExcelFile('data/Test.xlsx')
sheets = xls.book.worksheets
sList = [x.title for x in sheets if x.sheet_state == 'visible']
        
dfs = [pd.read_excel('data/Test.xlsx', sheet_name=s, skiprows=5, nrows=32, usecols='M:AD') for s in sList]
dfconcat = pd.concat(dfs)

Now you need adjust the columns, headers and so on as you did in your question. I hope that it works out for you. From my side here it worked like a charm.

  • Related