edit using utf-16 seems to get me closer in the right direction, but I have csv values that include commas such as "one example value is a description, which is long and can include commas, and quotes"
So with my current code:
filepath="csv_input/frups.csv"
rows = []
with open(filepath, encoding='utf-16') as f:
for line in f:
print('line=',line)
formatted_line=line.strip().split(",")
print('formatted_line=',formatted_line)
rows.append(formatted_line)
print('')
Lines get formatted incorrectly:
line= "FRUPS" "11111112" "Paahou 11111112, 11111112,11111112" "Bar, Achal" "Iagress" "Unassigned" "Normal" "GaWu , Suaair center will not be able to repair 3 couch part 11111112, 11111112,11111112 . Pleasa to repair .
formatted_line= ['"FRUPS"\t"11111112"\t"Parts not able to repair in Suzhou 11111112', ' 11111112', '11111112"\t"Baaaaaar', ' Acaaaal"\t"In Progress"\t"Unassigned"\t"Normal"\t"Got coaow Wu ', ' Suar cat 11111112', ' 11111112', '11111112. Pleasa to repair .']
line= 11111112
formatted_line= ['11111112']
So in this example, the line
is separated by long spaces, but breaking up by commas is not as reliable for reading data line by line correctly
I am trying to read a csv line by line in python but each solution leads to a different error.
- Using pandas:
filepath="csv_input/frups.csv"
data = pd.read_csv(filepath, encoding='utf-16')
for thing in data:
print(thing)
print('')
Fails to read_csv the file with an error Error tokenizing data. C error: Expected 7 fields in line 16, saw 8
- Using csv_reader
# open file in read mode
with open(filepath, 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj)
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
Fails with error at for row in csv_reader
line with line contains NUL
I've tried to figure out what these NUL
characters our but trying to investigate using code leads to different errors:
data = open(filepath, 'rb').read()
print(data.find('\x00'))
error: argument should be integer or bytes-like object, not 'str'
- another read solution trying to strip certain characters
with open(filepath,'rb') as f:
contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")
error: TypeError: a bytes-like object is required, not 'str'
It seems like my csv has some weird characters that cause python to error out. I can open and view my csv just fine in excel, how can I read my csv line by line? Such as
row[0]=['col1','col2','col3']
row[1]=['val1','val2','val3']
etc...
CodePudding user response:
What you have shown at line
and formatted_line
is a hint that:
- your file is utf-16 encoded
- it uses tabs (
\t
) as delimiters
So you should use:
with the csv module:
# open file in read mode with open(filepath, 'r', encoding='utf-16') as read_obj: # pass the file object to reader() to get the reader object csv_reader = reader(read_obj, delimiter='\t') # Iterate over each row in the csv using reader object for row in csv_reader: # row variable is a list that represents a row in csv print(row)
with Pandas:
data = pd.read_csv(filepath, encoding='utf-16', sep='\t') for thing in data: print(thing) print('')
CodePudding user response:
You can always read the file manually to build such a structure
rows = []
with open(filepath) as f:
for line in f:
rows.append(line.strip().split(","))