Using read_csv(), I am getting the following results
df = pd.read_csv('data.csv')
data.head()
Output:
|col_1|col_2|col_3|
|-----|-----|-----|
|1,2,3| NaN | NaN |
|6,5,4| NaN | NaN |
I tried mentioning sep=',' but it didn't work After some investigation, I found out that basically the rows are enclosed in string quotes. like,
1, 2, 3 is stored as '1, 2, 3'
I tried my luck using vectorized string methods
df['col_1'].str.replace("'", "").astype(float)
But it throws me an error a ValueError: invalid literal for int() with base 10
My desired output is simply
|col_1|col_2|col_3|
|-----|-----|-----|
| 1 | 2 | 3 |
| 6 | 5 | 4 |
Thanks
EDIT: this is the sample of the data I took from the CSV
"6,148,72,35,0,33.6,0.627,50,1" "1,85,66,29,0,26.6,0.351,31,0" "8,183,64,0,0,23.3,0.672,32,1" "1,89,66,23,94,28.1,0.167,21,0" "0,137,40,35,168,43.1,2.288,33,1" "5,116,74,0,0,25.6,0.201,30,0" "3,78,50,32,88,31.0,0.248,26,1"
CodePudding user response:
Use quotechar = "quotation" in pd.read_csv
and it will be fine!! ;)
This way
pd.read_csv('data.csv', quotechar="'")
CodePudding user response:
Quotes are used in CSV files to signal that comma separated values should be kept together in a single column. You could create a new CSV file and write a script to read every line of the original file (with the quotes) into a string, remove the leading and trailing quotes from the string, and write the new string into your new CSV file. You could then call read_csv
on the new CSV file. This should fix the issue.
write_text = ''
with open('data.csv', 'r') as infile:
for line in infile:
if line[-1] == '\n':
line = line[0:-1]
line = line[1:-1] # remove quotes
write_text = line '\n'
write_text = write_text[0:-1] # remove trailing newline
with open('data_without_quotes.csv', 'w') as outfile:
outfile.write(write_text)