Python - save a column in Excel with a lot of rows with random UUIDs-CodePudding

I have the following code:

import xlsxwriter
import numpy as np
import pandas as pd
import random
import uuid
LIMIT=1000
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
for row in range(0,LIMIT):
    worksheet.write_column(row, 0, uuid.uuid4())
    row  = 1
workbook.close()

I get the following error:

TypeError: 'UUID' object is not iterable

I tried converting the UUID to a string using str() but that just gives me a single digit or letter in the excel sheet for some reason.

CodePudding user response：

The issue you have is that you are mixing 2 different functions (worksheet.write() and worksheet.write_column()).

If you want to use worksheet.write_column(), you need to have an iterable as data.
What the function does is to go through the iterable elements one by one and fill the column until it is done. With the code you currently have, your uuid.uuid4() is a string and therefore an iterable. So the function put one character per row. If you want a full UUID per row, you need to create a list of UUID. See below example.

LIMIT=1000
workbook = xlsxwriter.Workbook('test2.xlsx')
worksheet = workbook.add_worksheet()
uuid_list = [str(uuid.uuid4()) for _ in range(LIMIT)]
worksheet.write_column(0, 0, uuid_list)
workbook.close()

If the looping makes more sense to you as you programmed it like that, you need to use worksheet.write(). It will most probably be much slower but for 1000 strings or even much more you will probably not see the difference.

LIMIT=1000
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
for row in range(LIMIT):
    worksheet.write(row, 0, str(uuid.uuid4()))
    row  = 1
workbook.close()

You can get more information directly in the xlsxwriter package documentation

PERFORMANCE:
To address the comment performance, this is what I found: For a milliion row:
worksheet.write_column(): 55s overall, 5s to create the million UUID, 50s to write the data.
worksheet.write(): 35s overall, I am quite surprised it is faster than write column...
df.to_csv(): It is much faster to create a Dataframe and write it to a CSV or an excel file after (my test says 7-8s, 5s to create the million UUID, 2s to write the dataframe to CSV)

Code to do it with a dataframe:

import pandas as pd
LIMIT = 1_000_000
uuid_list = [str(uuid.uuid4()) for _ in range(LIMIT)]
df = pd.DataFrame({"column1": uuid_list})
df.to_csv("test.csv")