I would like to detect language in pandas column in python. After detecting it I want to write the language code as a column in pandas dataframe. Below is my code and what I tried. But I got an error please help.
Thank you.
data = {'text': ["It is a good option","Better to have this way","es un portal informático
para geeks","は、ギーク向けのコンピューターサイエンスポータルです"]}
# Create DataFrame
df = pd.DataFrame(data)
#get the language
for i in df['text']:
# Language Detection
df['lang'] = TextBlob(i)
CodePudding user response:
You can use langdetect library in Python for language detection.
pip install langdetect
import pandas as pd
from langdetect import detect
data = {'text': ["It is a good option","Better to have this way","es un portal informático para geeks","は、ギーク向けのコンピューターサイエンスポータルです"]}
df = pd.DataFrame(data)
df['lang'] = df['text'].apply(lambda x: detect(x))
CodePudding user response:
i think this will be enough:
#get the language
df['lang'] = df.apply(lambda x: TextBlob(x['text']), axis = 1)