I have translated a dataset and after translation some of the rows are getting shifted to another cell, thus changing the contents of the cell. (1) Dataset before translation (2) Dataset after translation
Brand | Model | Accel | Topspeed | Range | Efficiency | Rapid charge |
---|---|---|---|---|---|---|
Volkswagen | ID.3 Pro S | 7.9 sec | 160km/hr | 440km | 175Wh/km | Rapid charge possible |
Porshe | Taycan Turbo 5 | 2.8 sec | 260 km/h | 375 km | 223 Wh/km | Rapid charging possible |
Volkswagen | e-up! | 11.9 sec | 130km/hr | 420km | 11Wh/km | Rapid charge possible |
Volkswagen | ID.3 Pure | 10.0 sec | 160 km/h | 270 km | 167 Wh/km | Rapid charging possible |
Translated Dataset
Brand | Model | Accel | Topspeed | Range | Efficiency | Rapid charge |
---|---|---|---|---|---|---|
Volkswagen | ID.3 Pro S | 7.9 sec | 160km/hr | 440km | 175Wh/km | Charge Rapide possible |
Porshe | Taycan Turbo 5 | 2.8 sec | 260 km/h | 375 km | 223 Wh/km | Charge rapide possible |
Volkswagen | e-up! | 11 | 9 sec | 130km/hr | 420km | 11Wh/km |
Volkswagen | ID.3 Pure | 10.0 sec | 160 km/h | 270 km | 167 Wh/km | Rapid charging possible |
Here, you can see that after translation 11.9 sec just got break into 11 and 9 sec in the 3rd row Below is the code I am using for translation
from googletrans import Translator
myfile=open("ElectricCarData_Norm.csv")
f=myfile.readlines()
translator=Translator()
with open("Electric.csv", 'w',encoding="utf-8") as op:
#for line in f:
#print(line)
translation = translator.translate(f, src='en',dest='fr')
for trans in translation:
#print(line," ",translation)
op.write(trans.text)
op.write('\n')
Please explain why it is happening and how to solve it.
CodePudding user response:
If the translation doesn't keep the comas or the delimiter your dataset won't be the same. I sugest doing this:
import pandas as pd
df = pd.read_csv("your_data.csv", delimiter= ",")
df["Rapid Charge"] = df.apply(lambda x: translator.translate(x["Rapid charge"], src='en',dest='fr'))
df.to_csv(index=False)
CodePudding user response:
You should open your raw CSV file in a plain text editor (like notepad, or even your python IDE) and see what's wrong: Your translator probably translates 11.9
as 11,9
, which is what moves everything one column over. I'm not sure why it didn't break the previous lines though
Use the csv
module to read and write your csv files instead of simply translating entire lines. This will correctly escape any commas and you won't see the same issue.
For example:
import csv
from googletrans import Translator
translator=Translator()
with open("ElectricCarData_Norm.csv") as in_file, open("Electric.csv", 'w',encoding="utf-8") as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
translation = translator.translate(row, src='en',dest='fr')
writer.writerow(t.text for t in translation)
which gives:
Marque,Modèle,Accél,Vitesse de pointe,Intervalle,Efficacité,Charge rapide
Volkswagen,ID.3 Pro S,"7,9 s",160km/h,440km,175Wh/km,Charge rapide possible
Porsche,Taycan Turbo 5,"2,8 s",260 km/h,375 kilomètres,223 Wh/km,Charge rapide possible
Volkswagen,e-up !,"11,9 s",130km/h,420km,11Wh/km,Charge rapide possible
Volkswagen,ID.3 pur,"10,0 s",160km/h,270 kilomètres,167 Wh/km,Charge rapide possible
Another possible solution is to rewrite your input CSV using the quoting=csv.QUOTE_ALL
option, so that every element is enclosed in quotes. Then, you can translate entire lines. I do not know if Google Translate translates quotes.
import csv
import io
with open("ElectricCarData_Norm.csv") as in_file, io.StringIO() as temp_file:
reader = csv.reader(in_file)
writer = csv.writer(temp_file, quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
temp_file.seek(0)
csv_temp = temp_file.read()
f = csv_temp.splitlines()
with open("Electric.csv", 'w',encoding="utf-8") as op:
translation = translator.translate(f, src='en',dest='fr')
for trans in translation:
op.write(trans.text)
op.write('\n')
which writes an output csv that contains quotes around all elements:
"Marque","Modèle","Accél","Vitesse maximale","Autonomie","Efficacité","Charge rapide"
"Volkswagen","ID.3 Pro S","7.9 sec","160km/h","440km","175Wh/km","Charge rapide possible"
"Porsche","Taycan Turbo 5","2.8 sec","260 km/h","375 km","223 Wh/km","Recharge rapide possible"
"Volkswagen", "e-up !", "11,9 sec", "130 km/h", "420 km", "11 Wh/km", "Recharge rapide possible"
"Volkswagen","ID.3 Pure","10.0 sec","160 km/h","270 km","167 Wh/km","Charge rapide possible"