My data looks like below, and it is in excel file with extension xlsx, so I am using openpyxl
library to read the file in python and extract the data from row 6 onwards for 2 columns,and I am appending the extracted data into dictionary with keys "Column_name" and "Column_Type". The reason for doing this way is because, I would like to add extra string between the extracted data. The string I would like to add is CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE();
So far my code and output looks like below.My expected output in txt file is also mentioned below. If you notice, I would like to enumerate value from both keys at same time, I am not sure how to achive this in python
so any help is appreciated ?
Thanks in advance for your time and effort!
Data
File Name: Employee
Sheet Name: Employee
File Type: csv
Field Name Type
Name String
Salary Numeric
Date Date
Phone Int
Code so far
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Access Sheet
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
theString = " "
# This relies on the lists 'Column_name' and 'Column_Type' always being the same length
# I.e., there should always be a value for each key
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < ws.max_row:
theString = ", "
outputFile.close()
Output in txt file with above Code:
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE([['Name', 'Salary', 'Date', 'Phone'], ['String', 'Numeric', 'Date', 'Int']]);
Expected output in Txt file
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE( Name String, Salary Numeric, Date Date, Phone Int);
CodePudding user response:
You could take advantage of the fact that there should always be the same number of items in the "Column_name"
and "Column_Type"
lists to reduce the problem to iteration using a single variable:
# Simulated Excel Data
ws = {
'max_row': 3,
'cell': [
['Name', 'String'],
['Salary', 'Numeric'],
['Date', 'Date'],
['Phone', 'Int']
]
}
mylines={"Column_name":[],"Column_Type":[]}
for i in range(0, ws['max_row'] 1):
name = ws['cell'][i][0]
name1=ws['cell'][i][1]
mylines["Column_name"].append(name)
mylines["Column_Type"].append(name1)
theString = " "
# This relies on the lists 'Column_name' and 'Column_Type' always being the same length
# I.e., there should always be a value for each key
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < ws['max_row']:
theString = ", "
# OLD
print("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format([(mylines[k]) for k,v in mylines.items()]) "\n")
# NEW
print("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
When I ran this, I got the following output (the first line from the old version and the second line from the new version):
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE([['Name', 'Salary', 'Date', 'Phone'], ['String', 'Numeric', 'Date', 'Int']]);
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE( Name String, Salary Numeric, Date Date, Phone Int);
Of course, if you have actual numeric and other non-string data in your spreadsheet, you'll also want to cast the variable values to strings using str()
before appending them to theString
.
The only difference from your code (other than the changes I made to have the code work as a standalone without an actual spreadsheet) is replacing the format
parameter in the last line with a string built by appending the values from both lists in mylines
.