Home > Mobile >  pandas read_csv file type with double quotes and no-double quotes
pandas read_csv file type with double quotes and no-double quotes

Time:03-26

Hi I have a CSV with this format

Headers: SKU, Product_Name, product_id

3735,[Freebies PC] - Holyshield! Sunscreen Comfort Corrector Serum SPF 50 PA 5 mL,154674

4568,"Consumables Mika furit 500 gr @250 (16x12x11) packaging grape, orange)",202737

2403,Laurier Active Day Super Maxi 30 Pcs,8992727002714

I want to be able to read as dataframe in csv, however the problem is that some product names uses "," which is not being able to be read as properly. I checked other sources trying to use sep, however some product names have that others don't. How can i read it properly?

I tried using

productList = pd.read_csv('products/products.csv',encoding='utf-8', engine'python)

It returns:

sku Product_Name product_id
3735 [Freebies PC] - Holyshield! Sunscreen Comfort Corrector Serum SPF 50 PA 5 mL 154674
4568,"Consumables Mika furit 500 gr @250 (16x12x11) packaging grape, orange)",202737 nan nan
42403 Laurier Active Day Super Maxi 30 Pcs 8992727002714

expected output is

sku Product_Name product_id
3735 [Freebies PC] - Holyshield! Sunscreen Comfort Corrector Serum SPF 50 PA 5 mL 154674
4568 Consumables Mika furit 500 gr @250 (16x12x11) packaging grape, orange) 202737
42403 Laurier Active Day Super Maxi 30 Pcs 8992727002714

How can I do so?

CodePudding user response:

Content of sample.csv file:

product_id,product_name,sku_number
2168,Sanjin Watermelon Frost Obat Sariawan Powder/Bubuk,6903193004029
3798,Common Grounds Cloak & Dagger Instant Coffee 1 Sachets,313166
3799,Common Grounds Ethiopia Guji Instant Coffee 1 Sachets,175744
3580,Emina Glossy Stain Lip Tint Autumn Bell 3gr,8993137707220
"3795,""Hansaplast Kasa Steril - 7,5 x 7,5cm"",8999777016043"
"2997,""Panda GP71 2,5mm"",616920"

It seems like output process from db generates error in exported data for some reason. If you are not able to correct the process possible solution is the following:

import pandas as pd
from io import StringIO

with open('sample.csv', 'r') as f:
    data = f.read().replace(',""', '","').replace('"",', '","')

df = pd.read_csv(StringIO(data))
df

Returns

enter image description here

  • Related