Home > database >  How to apply sentiment analysis model on text column all at once in a dataframe?
How to apply sentiment analysis model on text column all at once in a dataframe?

Time:07-20

I am using germansentiment to test the sentiments of german tweets (text) in a dataframe (df).

I am using the following code to do so:

from germansentiment import SentimentModel
model = SentimentModel()

df['sentiment'] = ''
for i in range(len(df)):
    df['sentiment'][i] = model.predict_sentiment([df['text'].iloc[i]])
    print(df['sentiment'][i])

Since I am looping over all the rows which are more than 130,000 and it is taking forever to complete the task.

Is there any better way to do it which would take less time?

CodePudding user response:

You could check if all your tweets are unique. If they are not, I would suggest to encode only the unique ones and use this as a lookup table to fill your dataframe.

Otherwise you could also use a lambda instead of your for loop. Depending on the use case, it can be quicker.

I would also suggest, if you do not need the print, to remove that line. If you want to track the progress of your loop, there are better ways to do so.

To be precise, I would probably do something like:

from tqdm.auto import tqdm
tqdm.pandas()
df['sentiment'] = df['text'].progress_apply(lambda text: model.predict_sentiment(text))

this should get the same output as your loop. The progress will be displayed as a bar also calculating how long it takes to finish. Without the print you should already be quicker and the lambda could also speed you up.

CodePudding user response:

Looking for half a second at the documentation you linked... this is the better solution:

df['text'] = model.predict_sentiment(df['text'].tolist())

You only have to pass a list once to model.predict_sentiment to get a list of predictions back.

  • Related