I have a *.csv
file that has 2 columns with 4 rows of data. I want to delete those rows that contains English-like (Hinglish words eg. kya haal hai) but non-English words. Example given in image
Thinking about above problem, I want to solve this below list first.
a = [ "This is not good so mai yah row hatana chahta hun.", "Nice!, kya haal pyare friend"]
Output should be:
This is not good so row.
Nice! friend
Note - This Data for copy paste purpose only
This is not good so mai yah row hatana chahta hun. ok
Nice!, kya haal pyare friend thik hu
Please help Me Definitely
Google is a comPaNY yes it is
CodePudding user response:
You will need an English library here. The enchant
Python library is one option.
import enchant
d = enchant.Dict("en_US")
def all_english(s):
words = s.split()
return len(words) == sum([d.check(re.sub(r'[!@#$?:;,.] ', '', x.lower())) for x in words])
df = df[df["A"].map(lambda x: all_english(x))]
CodePudding user response:
I got the correct output. Thanks to Tim Biegeleisen and tripleee
import pandas as pd
import io
df = pd.read_csv(r'C:\Users\Mini-PC\Desktop\data.csv')
#print(df.head())
import enchant
import re
d = enchant.Dict("en_US")
def all_english(s):
words = s.split()
return len(words) == sum([d.check(re.sub(r'[!@#$?:;,.] ', '', x.lower())) for x in words])
df = df[df["A"].map(lambda x: all_english(x))]
print(df)
Output:
A B
Please help Me Definitely
Google is a comPaNY yes it is