Home > database >  If df records is in English move it to another column using python
If df records is in English move it to another column using python

Time:08-31

I had a list of Arabic and English elements, I transfer it into a dataframe BUT the issue is I have all values in One single column, I want to move the records that contains English words to another column: so what I have now:

COLUMN 1
هلا
السلام
WELCOMING
شي اخر

THE OUTPUT THAT I WANT IS:

COLUMN 1 COLUMN 2
هلا welcoming
السلام others eng. words

hope its clear..

CodePudding user response:

You could go through the dataframe and use regex to see if the word is within the alphabet

reg = re.compile(r'[a-zA-Z]')

if reg.match(word):
    # Matches English
else:
    # Doesn't match English

or use isAlpha:

if word.encode().isalpha():
    # Matches English
else:
    # Doesn't match English

Depending on that, you could create a new dataframe and populate the appropriate columns.

CodePudding user response:

You can check for each entry if the first character is part of ASCII. If so, move to new column.

Disclaimer: Only works if one language contains no ASCII at all and the second language only contains ASCII-Characters

CodePudding user response:

You can use the langdetect library along with the pandas library like (and it works for any language):

import pandas as pd
from langdetect import detect, DetectorFactory

# init seed
DetectorFactory.seed = 0

# read data
df = pd.read_csv('data.csv')

# filter data
df_ar = df.drop(df[(df['col_1'].apply(detect) != 'ar')].index).reset_index()
df_other_lang = df.drop(df[(df['col_1'].apply(detect) == 'ar')].index).reset_index()

# get the result
result = pd.concat([df_ar, df_other_lang], axis=1).drop('index', axis=1)

# testing .. 
print(result)

output :

-before :

       col_1
0        هلا
1      hello
2     السلام
3  WELCOMING
4      other

-after

    col_1      col_1
0     هلا      hello
1  السلام  WELCOMING
2     NaN      other

You can then rename the cols afterwards

  • Related