I am retrieving a CSV file from an html form and decoding it with utf-8. I need two instances of this file in my program for various reasons but when I use .decode('utf-8') both instances of the file are consumed by the .decode() function.
Python code:
if request.method == 'POST':
#get the uploaded file
file = request.files['file']
file_copy = file
bank_selection = request.form['banks']
line_num = banks.get_line_no(file_copy)
for some reason the .decode() function in get_line_no consumes file and file_copy
def get_line_no(file):
file_data = file.read().decode('utf-8')
for line_num, row in enumerate(file_data.split('\n')):
if ',' in row and row.split(',')[0] == 'Date':
break
print(line_num)
return line_num
For some reason this does not work because when I try
dataframe = pd.read_csv(file, skiprows=(line_num))
pandas returns an empty error because both the original file and file_copy have been consumed by .decode()
The only way I've gotten it to work is by getting the user to send two files in the html form and retrieve them into different variables:
file = request.files['file']
file_copy = request.files['file2']
Why is this happening? Is there maybe a way to send two copies of the file from the html without the user having to input it twice?
CodePudding user response:
file_copy = file
doesn't create a new copy of the file or file handle, it just assigns the existing file handle referred to by file
to file_copy
. Once you read the file pointed to by file
using file.read()
, that also reads the file pointed to by file_copy
, since it's the same handle. Instead, just create a copy of the file data, e.g.:
if request.method == 'POST':
#get the uploaded file
file = request.files['file']
file_data = file.read()
bank_selection = request.form['banks']
line_num = get_line_no(file_data)
# after the function returns, file_data will still be availabe (and undecoded)
def get_line_no(file_data):
decoded = file_data.decode('utf-8')
for line_num, row in enumerate(file_data.split('\n')):
if ',' in row and row.split(',')[0] == 'Date':
break
print(line_num)
return line_num
If you need to read the same file with .read_csv()
, you can create a StringIO
object to allow it to be read as a file:
from io import StringIO
dataframe = pd.read_csv(StringIO(file_data.decode()), skiprows=(line_num))