Home > database >  Issues with pandas and writing to csv file
Issues with pandas and writing to csv file

Time:11-06

I am having an issue with pandas and writing to CSV file. When I run the python scripts I either run out of memory or my computer starts running slow after script is done running. Is there any way to chunk up the data in pieces and write the chunks to CSV? I am bit new to programing in Python.

import itertools, hashlib, pandas as pd,time
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
rows = []
for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A)   ':'   str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            rows.append([A , B, C])
t0 = time.time()
df = pd.DataFrame(data=rows, columns=['A', 'B', 'C'])
df.to_csv('data.csv', index=False)
tdelta = time.time() - t0
print(tdelta)

I would be really appreciative the help! Thank you!

CodePudding user response:

Since you are only using the dataframe to write to a file, skip it completely. You build the full data set into memory in a python list and then again in the dataframe, needlessly eating RAM. The csv module in the standard lib lets you write line by line.

import itertools, hashlib, time, csv
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
with open('test.csv', 'w', newline='') as fileobj:
    writer = csv.writer(fileobj)
    for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A)   ':'   str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            writer.writerow([A , B, C])

This will go fast until you've filled up the RAM cache that fronts your storage, and then will go at whatever speed the OS can get data to disk.

  • Related