I am using Openpyxl
library to read xlsx
file and extract few contents and add more strings to my txt output file.The excel file I am currently using contain sheets with name Summary and Employee. My below code is working fine for my current excel file. Now the issue is I would to use the same code for reading another excel file containing more sheets whose sheetnames I am not sure of. So in my code line ws = wb['Employee']
. The sheetname will change all the time. However, One thing I am sure about is I don't want to read any data from sheet1. All the data extraction will occur from sheet2 onwards in all the xlsx
files. I am not sure how to proceed from here so any help will be appreciated.
Thanks in advance for your time and efforts!
Code:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString = ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
outputFile.close() #Closing file
Updated Code based on SO User comment:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString = ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
outputFile.close() #Closing file
Excel data
<Sheet 1 Name -> Summary Sheet: Empty
<Sheet 2 Name -> Employee Sheet
File Name: Employee
Sheet Name: Employee
File Type: csv
Field Name Type
Name String
Salary Numeric
Date Date
Phone Int
<Sheet 3 Name-> Employee1 Sheet
File Name: Employee
Sheet Name: Employee1
File Type: csv
Field Name Type
Employee Name Date
Employee Salary Int
Employment Date Int
Office Phone Int
CodePudding user response:
To iterate through all worksheets in a workbook and read data in them (except the first worksheet, remove the ws = wb['Employee']
Use a for loop (insert before for i in range(5,...
as this
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
....
This will read each sheet and append data to mylines
, except the first sheet
Second Update As you mentioned in below comment, to add a new line with the new SQL query, please make these additional changes
- Add another entry to dictionary to indicate new line as below (careful to ensure the lines execute after all lines in a particular sheet are read) Edit the String formation so that once a NewLine is seen, that string is written to the output file. Do note that the NewFile boolean value will overwrite any file that is there. Multiple lines will be appended post that.
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
print(i, name)
name1=ws.cell(row=i, column=2).value
print(name1)
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
mylines["Column_name"].append('NextLine')
mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
for i in range(len(mylines['Column_name'])):
if(mylines['Column_name'][i] != 'NextLine'):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
theString = ", "
else:
theString = theString[:-2]
if NewFile:
NewFile = False
outputFile = open('output.txt', 'w') # Text file Output
print("New file ", theString)
else:
outputFile = open('output.txt', 'a')
print("Not new file ", theString)
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
outputFile.close()
theString = " "