Home > OS >  Some of the Rows of CSV Dataset are getting shifted after translation. Please provide some solution
Some of the Rows of CSV Dataset are getting shifted after translation. Please provide some solution

Time:08-18

I have translated a dataset and after translation some of the rows are getting shifted to another cell, thus changing the contents of the cell. (1) Dataset before translation (2) Dataset after translation

Brand Model Accel Topspeed Range Efficiency Rapid charge
Volkswagen ID.3 Pro S 7.9 sec 160km/hr 440km 175Wh/km Rapid charge possible
Porshe Taycan Turbo 5 2.8 sec 260 km/h 375 km 223 Wh/km Rapid charging possible
Volkswagen e-up! 11.9 sec 130km/hr 420km 11Wh/km Rapid charge possible
Volkswagen ID.3 Pure 10.0 sec 160 km/h 270 km 167 Wh/km Rapid charging possible

Translated Dataset

Brand Model Accel Topspeed Range Efficiency Rapid charge
Volkswagen ID.3 Pro S 7.9 sec 160km/hr 440km 175Wh/km Charge Rapide possible
Porshe Taycan Turbo 5 2.8 sec 260 km/h 375 km 223 Wh/km Charge rapide possible
Volkswagen e-up! 11 9 sec 130km/hr 420km 11Wh/km
Volkswagen ID.3 Pure 10.0 sec 160 km/h 270 km 167 Wh/km Rapid charging possible

Here, you can see that after translation 11.9 sec just got break into 11 and 9 sec in the 3rd row Below is the code I am using for translation

from googletrans import Translator
myfile=open("ElectricCarData_Norm.csv")
f=myfile.readlines()

translator=Translator()  
with open("Electric.csv", 'w',encoding="utf-8") as op:      
  #for line in f:
    #print(line)
  translation = translator.translate(f, src='en',dest='fr')
  for trans in translation:

  #print(line," ",translation)
    op.write(trans.text)
    op.write('\n')

Please explain why it is happening and how to solve it.

CodePudding user response:

If the translation doesn't keep the comas or the delimiter your dataset won't be the same. I sugest doing this:

import pandas as pd

df = pd.read_csv("your_data.csv", delimiter= ",")
df["Rapid Charge"] = df.apply(lambda x: translator.translate(x["Rapid charge"], src='en',dest='fr'))

df.to_csv(index=False)

CodePudding user response:

You should open your raw CSV file in a plain text editor (like notepad, or even your python IDE) and see what's wrong: Your translator probably translates 11.9 as 11,9, which is what moves everything one column over. I'm not sure why it didn't break the previous lines though

Use the csv module to read and write your csv files instead of simply translating entire lines. This will correctly escape any commas and you won't see the same issue.

For example:

import csv
from googletrans import Translator

translator=Translator()

with open("ElectricCarData_Norm.csv") as in_file, open("Electric.csv", 'w',encoding="utf-8") as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file)
    for row in reader:
        translation = translator.translate(row, src='en',dest='fr')
        writer.writerow(t.text for t in translation)

which gives:

Marque,Modèle,Accél,Vitesse de pointe,Intervalle,Efficacité,Charge rapide
Volkswagen,ID.3 Pro S,"7,9 s",160km/h,440km,175Wh/km,Charge rapide possible
Porsche,Taycan Turbo 5,"2,8 s",260 km/h,375 kilomètres,223 Wh/km,Charge rapide possible
Volkswagen,e-up !,"11,9 s",130km/h,420km,11Wh/km,Charge rapide possible
Volkswagen,ID.3 pur,"10,0 s",160km/h,270 kilomètres,167 Wh/km,Charge rapide possible

Another possible solution is to rewrite your input CSV using the quoting=csv.QUOTE_ALL option, so that every element is enclosed in quotes. Then, you can translate entire lines. I do not know if Google Translate translates quotes.

import csv
import io

with open("ElectricCarData_Norm.csv") as in_file, io.StringIO() as temp_file:

    reader = csv.reader(in_file)
    writer = csv.writer(temp_file, quoting=csv.QUOTE_ALL)

    for row in reader:
        writer.writerow(row)

    temp_file.seek(0)
    csv_temp = temp_file.read()

f = csv_temp.splitlines()

with open("Electric.csv", 'w',encoding="utf-8") as op:      
    translation = translator.translate(f, src='en',dest='fr')
    for trans in translation:
        op.write(trans.text)
        op.write('\n')

which writes an output csv that contains quotes around all elements:

"Marque","Modèle","Accél","Vitesse maximale","Autonomie","Efficacité","Charge rapide"
"Volkswagen","ID.3 Pro S","7.9 sec","160km/h","440km","175Wh/km","Charge rapide possible"
"Porsche","Taycan Turbo 5","2.8 sec","260 km/h","375 km","223 Wh/km","Recharge rapide possible"
"Volkswagen", "e-up !", "11,9 sec", "130 km/h", "420 km", "11 Wh/km", "Recharge rapide possible"
"Volkswagen","ID.3 Pure","10.0 sec","160 km/h","270 km","167 Wh/km","Charge rapide possible"
  • Related