How to use exception handling in pandas while using a function-CodePudding

I have the following dataframe:

   a    b      x  y    language
0  id1  id_2   3 text1
1  id2  id_4   6 text2
2  id3  id_6   9 text3
3  id4  id_8  12 text4

I am attempting to use langdetect to detect the language of the text elements in column y.

This is the code I have used for that purpose:

for i,row in df.iterrows():
    df.loc[i].at["language"] = detect(df.loc[i].at["y"])

Unfortunately, there are non-textual elements (including blanks, symbols, numbers and combinations of these) involved in this column, so I get the following traceback:

LangDetectException                       Traceback (most recent call last)
<ipython-input-40-3b2637554e5f> in <module>
      1 df["language"]=""
      2 for i,row in df.iterrows():
----> 3     df.loc[i].at["language"] = detect(df.loc[i].at["y"])
      4 df.head()

C:\Anaconda\lib\site-packages\langdetect\detector_factory.py in detect(text)
    128     detector = _factory.create()
    129     detector.append(text)
--> 130     return detector.detect()
    131 
    132 

C:\Anaconda\lib\site-packages\langdetect\detector.py in detect(self)
    134         which has the highest probability.
    135         '''
--> 136         probabilities = self.get_probabilities()
    137         if probabilities:
    138             return probabilities[0].lang

C:\Anaconda\lib\site-packages\langdetect\detector.py in get_probabilities(self)
    141     def get_probabilities(self):
    142         if self.langprob is None:
--> 143             self._detect_block()
    144         return self._sort_probability(self.langprob)
    145 

C:\Anaconda\lib\site-packages\langdetect\detector.py in _detect_block(self)
    148         ngrams = self._extract_ngrams()
    149         if not ngrams:
--> 150             raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
    151 
    152         self.langprob = [0.0] * len(self.langlist)

LangDetectException: No features in text.

Is there a way I can employ exception handling so the detect function from the langdetect library may be used for those appropriate text elements?

CodePudding user response：

So, given the following dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "a": {0: "id1", 1: "id2", 2: "id3", 3: "id4"},
        "b": {0: "id_2", 1: "id_4", 2: "id_6", 3: "id_8"},
        "x": {0: 3, 1: 6, 2: 9, 3: 12},
        "y": {0: "text1", 1: "text2", 2: "text3", 3: "text4"},
        "language": {0: "", 1: "", 2: "", 3: ""},
    }
)

And, for the purpose of the answer, these mocked exception and function:

class LangDetectException(Exception):
    pass

def detect(x):
    if x == "text2":
        raise LangDetectException
    else:
        return "english"

You can skip rows (row 1 here) in which "y" has non-textual elements, like this:

for i, row in df.iterrows():
    try:
        df.loc[i, "language"] = detect(row["y"])
    except LangDetectException:
        continue

And so:

print(df)
# Outputs
     a     b   x      y language
0  id1  id_2   3  text1  english
1  id2  id_4   6  text2
2  id3  id_6   9  text3  english
3  id4  id_8  12  text4  english