Using the pd.read_csv, all the data is in the first column. Other columns show NaN-CodePudding

Using read_csv(), I am getting the following results

df = pd.read_csv('data.csv')
data.head()

Output:

|col_1|col_2|col_3|
|-----|-----|-----|
|1,2,3| NaN | NaN |
|6,5,4| NaN | NaN |

I tried mentioning sep=',' but it didn't work After some investigation, I found out that basically the rows are enclosed in string quotes. like,

1, 2, 3 is stored as '1, 2, 3'

I tried my luck using vectorized string methods

df['col_1'].str.replace("'", "").astype(float)

But it throws me an error a ValueError: invalid literal for int() with base 10

My desired output is simply

|col_1|col_2|col_3|
|-----|-----|-----|
|  1  |  2  |  3  |
|  6  |  5  |  4  |

Thanks

EDIT: this is the sample of the data I took from the CSV

"6,148,72,35,0,33.6,0.627,50,1" "1,85,66,29,0,26.6,0.351,31,0" "8,183,64,0,0,23.3,0.672,32,1" "1,89,66,23,94,28.1,0.167,21,0" "0,137,40,35,168,43.1,2.288,33,1" "5,116,74,0,0,25.6,0.201,30,0" "3,78,50,32,88,31.0,0.248,26,1"

CodePudding user response：

Use quotechar = "quotation" in pd.read_csv and it will be fine!! ;)

This way

pd.read_csv('data.csv', quotechar="'")

CodePudding user response：

Quotes are used in CSV files to signal that comma separated values should be kept together in a single column. You could create a new CSV file and write a script to read every line of the original file (with the quotes) into a string, remove the leading and trailing quotes from the string, and write the new string into your new CSV file. You could then call read_csv on the new CSV file. This should fix the issue.

write_text = ''

with open('data.csv', 'r') as infile:
    for line in infile:
        if line[-1] == '\n':
            line = line[0:-1]
        line = line[1:-1]  # remove quotes
        write_text  = line   '\n'
write_text = write_text[0:-1]  # remove trailing newline

with open('data_without_quotes.csv', 'w') as outfile:
    outfile.write(write_text)