I have 3 xls files that have 3 sheets per file. All sheets have same column headers but order as you see below is different
1.xls
Name Address Date City State Zip
2.xls
Address Date City Zip Name State
3.xls
City Zip Name Address Date State
I want my final xls file to concatenate all 3 files and sheets
Output.xls
Name Address Date City State Zip RowNumber SheetName
The rownumber should be the specific row number from each file and sheet the data comes from before concatenation.Sheetname should be the sheet it comes from within the xls file.
My attempt-
import os
import pandas as pd
#set src directory
os.chdir('C:/Users/hhh/Desktop/python/Concat')
def read_sheets(filename):
result = []
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfoo = read_sheets(files)
But nothing happens and i just receive an assertion error saying assert content_or_path is not None. Is this because column orders don't match? is there a workaround? Number of columns are same in all files and sheets. Within each file sheets have same order. But if you compare 1.xls sheets to 2.xls the column order is different as you can see in my reprex above
CodePudding user response:
I believe your question is asking to take 9 different sheets (3 each in 3 different .xls files) and combine them into a single sheet in a new spreadsheet Output.xls.
A few comments to start:
- The different column order for different input files shouldn't be a problem.
- You may want to consider having the output file be a .xlsx file instead of .xls, since the xlwt package required to work with .xls files raises a warning:
FutureWarning:
As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas.
This is the only engine in pandas that supports writing in the xls format.
Install openpyxl and write to an xlsx file instead.
You can set the option io.excel.xls.writer to 'xlwt' to silence this warning.
While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
writer = pd.ExcelWriter('Output.xls')
- The sample code in your question sends a list of files to the function read_sheets(), so this function needs to be changed to expect this list rather than a single file.
- The code needs to loop over input files, then over the sheets in each file.
Here is a modification of your code which does what I think you are asking (with a different argument for os.chdir() to match my test environment):
import os
import pandas as pd
#set src directory
#os.chdir('C:/Users/hhh/Desktop/python/Concat')
os.chdir('./Concat')
def read_sheets(files):
result = []
for filename in files:
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
folder_path = '.'
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfCombined = read_sheets(files)
writer = pd.ExcelWriter('Output.xls')
dfCombined.to_excel(writer, index=None, sheet_name='Combined')
writer.save()
writer.close()
The sample output looks like this:
Name Address Date City State Zip Sheetname Row
Alice 1 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 1 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 1 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 1 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 1 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 1 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 1 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 1 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 1 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 1 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 1 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 1 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 2 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 2 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 2 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 2 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 2 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 2 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 2 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 2 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 2 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 2 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 2 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 2 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 3 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 3 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 3 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 3 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 3 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 3 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 3 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 3 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 3 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 3 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 3 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 3 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3