Good day,
I have a problem where a CSV file, sent to a View and later passed into a new thread for processing, sometimes closes prematurely and I can't figure out why. The behaviour is intermittent and only started happening after I switched to using a new thread to process the file.
This is the original way I was processing the file and it worked, but for large files it caused time-out issues on the client:
class LoadCSVFile(APIView):
permission_class = (IsAuthenticated,)
parser_classes = [FormParser, MultiPartParser]
def post(self, request):
file = request.FILES['file']
data_set = file.read().decode('utf-8')
io_string = io.StringIO(data_set)
for row_data in csv.reader(io_string, delimiter=',', quotechar='"'):
print('row_data:', row_data)
return Response({ 'message': 'File received and is currently processing.', 'status_code': status.HTTP_200_OK }, status.HTTP_200_OK)
So I now process the file in a new thread like so:
class LoadCSVFile(APIView):
permission_class = (IsAuthenticated,)
parser_classes = [FormParser, MultiPartParser]
def post(self, request):
request_handler = RequestHandler(request)
csv_handler = CSVHandler(request.FILES['file'])
# Fire and forget the file processing.
t = threading.Thread(target=request_handler.resolve, kwargs={ 'csv_handler': csv_handler })
t.start()
return Response({ 'message': 'File received and is currently processing.', 'status_code': status.HTTP_200_OK }, status.HTTP_200_OK)
class RequestHandler(object):
def __init__(self, request : Request):
self.request = request
def resolve(self, **kwargs):
csv_handler = kwargs['csv_handler']
try:
print('Processing file...')
csv_handler.process_file()
except Exception as e:
print('process_file level error:', e)
class CSVHandler(object):
def __init__(self, file):
self.file = file
def get_reader(self):
# Error is raised at the following line: "I/O operation on closed file."
data_set = self.file.read().decode('utf-8')
io_string = io.StringIO(data_set)
return csv.reader(io_string, delimiter=',', quotechar='"')
def process_file(self, **kwargs):
for row_data in self.get_reader():
print('row_data:', row_data)
For a while it was great, but then I started to notice occasional I/O errors.
- This happens with large (5000 lines) and small (2 lines) files.
- I can go 50 uploads without seeing the error, then it will happen 2 or 3 times in a row. Or anywhere in between.
- Both the request is saved in the RequestHandler and the file is saved in CSVHandler before the thread is initiated and I don't know how else to keep the InMemoryUploadedFile alive until I need it (
csv_handler.get_reader()
).
Any suggestions?
Thank you for your time.
CodePudding user response:
The issue was caused by the main thread returning from the POST request prior to the CSV file being opened by csv_handler.get_reader()
. I'm still not sure how the file gets lost while the RequestHandler
and CSVHandler
have references on the request
and file
objects. Maybe it's a Django thing.
I fixed it by moving the reader logic up into the CSVHandler
constructor and using a lock to prevent the race condition.
class CSVHandler(object):
def __init__(self, file):
self.lock = threading.Lock()
with self.lock:
self.file = file
data_set = self.file.read().decode('utf-8')
io_string = io.StringIO(data_set)
self.reader = csv.reader(io_string, delimiter=',', quotechar='"')
def process_file(self, **kwargs):
for row_data in self.reader:
print('row_data:', row_data)