how to filter with certain condition and apply a function at the same time in pandas-CodePudding

I have a Dataframe like this:

text,                  pred                 score                             logits
No thank you.          positive    [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]   [0.01, 0.02, 0.97]      
They didn't respond me negative    [[], [0, 1, 0], [], []]                [0.81, 0.10, 0.18]

in which you can use this:

df = pd.DataFrame({'text':['No thank you', 'They didnt respond me negative'],
                   'pred':['positive', 'negative'],
                   'score':['[[0, 0, 1], [1, 0, 2],[1, 0, 0]]]', '[[], [0, 1, 0], [], []]'],
                   'logits':['[0.01, 0.02, 0.97]', '[0.81, 0.10, 0.18]']})

What I need to do is:

if the df['pred'] = 'positive' I want to sum all the elements in the first position of the score on that row sum(df['score'][0]) which is (0 1 1) and multiple by third element of logits df['logits'][2] which is(0.97).

(We will do the same thing for the negative just change the position: sum(df['score'][1]) which is 1 0 0 0 and multiple by first element of logits which is df['logits'][1] which is 0.81

So the output would look like this:

text,                  pred                 score                       logits          decision
No thank you.          positive    [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]  1.94    
They didn't respond me negative    [[], [0, 1, 0], [], []]            [0.81, 0.10, 0.18]  0.81

What I have done (or the logic I need to follow) and Obviously my code does not run and I guess the problem is here sum(df['score'][0]).

df[df['pred'] == 'positive','decision'] = df[df['pred'] == 'positive', df['logits'][2] * sum(df['score'][0])]

for more clarity

in score we have one list associated to each word. that's why three list in first row and 4 list in second row. And they are nothing but (positive, negative, neutral) score associated to each word. if the list empty we replace it with zero in the calculations.

CodePudding user response：

One possible solution is to create mapping-dictionaries with various rules (e.g. if positive, sum only first index (0) etc.):

m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}

df["decision"] = df.apply(
    lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
    * x["logits"][m_mul[x["pred"]]],
    axis=1,
)
print(df)

Prints:

                    text,      pred                              score              logits  decision
0           No thank you.  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]      1.94
1  They didn't respond me  negative            [[], [0, 1, 0], [], []]   [0.81, 0.1, 0.18]      0.81

EDIT: with ast.literal_eval:

import pandas as pd
from ast import literal_eval


df = pd.DataFrame(
    {
        "text": ["No thank you", "They didnt respond me negative"],
        "pred": ["positive", "negative"],
        "score": [
            "[[0, 0, 1], [1, 0, 2],[1, 0, 0]]",
            "[[], [0, 1, 0], [], []]",
        ],
        "logits": ["[0.01, 0.02, 0.97]", "[0.81, 0.10, 0.18]"],
    }
)


df["score"] = df["score"].apply(literal_eval)
df["logits"] = df["logits"].apply(literal_eval)

m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}


df["decision"] = df.apply(
    lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
    * x["logits"][m_mul[x["pred"]]],
    axis=1,
)
print(df)

Prints:

                             text      pred                              score              logits  decision
0                    No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]      1.94
1  They didnt respond me negative  negative            [[], [0, 1, 0], [], []]   [0.81, 0.1, 0.18]      0.81