Home > Software engineering >  How to extract data from Excel and output in txt file along with modified string using Python?
How to extract data from Excel and output in txt file along with modified string using Python?

Time:04-23

I have excel file with 2 tabs. First tab is "Summary" and 2nd tab is "Employee". The format of excel looks like below. Please note that below excel is for showcase only my original file contains ALWAYS "Summary" as 1st sheet and other tabs are ALWAYS based on "Sheet Name" (cell B2). So an excel can have 10 tabs or 20 tabs with 1st one being "Summary", which we do not need.What I would like to do is extract data from 6th row on-wards from each sheet and create txt file which looks like below in "Output in TXT file" section.So the generated employee txtfile will have 2 SQL code which looks like below. If type is "STRING" then type will be replace with VARCHAR, if type is date then replace with date and so on in generated txt file.

I started working on loading the file in Python using this enter image description here

Output in TXT file:

enter image description here

Code so far

    from openpyxl import load_workbook

data_file='\test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee']




    for i in range(6, ws.max_row 1):  
        name = ws.cell(row=i, column=1).value  
        outputFile = open('C:/Jupyter Notebook/{}.txt'.format(name), 'w')  
        for j in range(1, ws.max_column   1): 
            outputFile.write(ws.cell(row=i, column=j).value   '\n')  
    outputFile.close() 

CodePudding user response:

This is the type of question that's a little difficult to answer, because I'm afraid the answer is more or less just "go and learn how to actually program in Python". It seems like you're somewhat just blindly copying code you see in tutorials - if you were fluent in Python it would be obvious how to do what you're trying to do and why what you're doing right now won't work.

Still, since solving many individual problems is one of the way you learn something, let me see if I can give you some pointers here.

Your code currently looks like this (note that I shortened the output file path for the sake of readability in this answer):

for i in range(6, ws.max_row 1):  
    name = ws.cell(row=i, column=1).value
    outputFile = open('{}.txt'.format(name), 'w')  
    for j in range(1, ws.max_column   1): 
        outputFile.write(ws.cell(row=i, column=j).value   '\n')  
outputFile.close()

I can't run this code because I don't have any Excel files to hand, but I imagine it should produce four files called Name.txt, Salary.txt, Date.txt, and Phone.txt. Each file should contain the values from the corresponding row of the worksheet, separated by newlines.

Your questions are: (1) why is this outputting to four files instead of one, and (2) how can you write the SQL commands you want to that file instead of just the values from the worksheet.

For (1), the script is writing four files because that's exactly what you're telling it to do. You call open() four times, with four different filenames, so it creates four files. If you want to create just one file and write to that, try something like:

outputFile = open('output.txt', 'w')  
for i in range(6, ws.max_row 1):  
    name = ws.cell(row=i, column=1).value
    for j in range(1, ws.max_column   1): 
        outputFile.write(ws.cell(row=i, column=j).value   '\n')  
outputFile.close()

To write the output that you want, you should... write the output that you want. For example, to write the line "CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE(" to the file, you write

outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE("   "\n")

To write "Hello", you run outputFile.write("Hello"), and so on. ws.cell(row=i, column=j).value gets you the contents of the (i, j)-th cell in the worksheet, which is why passing it to write() causes that value to be written to the file. Just call write() with what you want to be written to the file.

CodePudding user response:

A possible solution you could do is export the excel document over to a csv file, then use python3's built-in csv reader to load up the data from the exported csv file. Hopefully this could help you out!

A bit more info on Python3's csv library: https://docs.python.org/3/library/csv.html

  • Related