I have a Dataframe like this:
text, pred score logits
No thank you. positive [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]] [0.01, 0.02, 0.97]
They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.10, 0.18]
in which you can use this:
df = pd.DataFrame({'text':['No thank you', 'They didnt respond me negative'],
'pred':['positive', 'negative'],
'score':['[[0, 0, 1], [1, 0, 2],[1, 0, 0]]]', '[[], [0, 1, 0], [], []]'],
'logits':['[0.01, 0.02, 0.97]', '[0.81, 0.10, 0.18]']})
What I need to do is:
if the df['pred'] = 'positive'
I want to sum all the elements in the first position of the score
on that row sum(df['score'][0])
which is (0 1 1)
and multiple by third element of logits
df['logits'][2]
which is(0.97)
.
(We will do the same thing for the negative
just change the position:
sum(df['score'][1])
which is 1 0 0 0
and multiple by first element of logits
which is df['logits'][1]
which is 0.81
So the output would look like this:
text, pred score logits decision
No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.10, 0.18] 0.81
What I have done (or the logic I need to follow) and Obviously my code does not run and I guess the problem is here sum(df['score'][0])
.
df[df['pred'] == 'positive','decision'] = df[df['pred'] == 'positive', df['logits'][2] * sum(df['score'][0])]
for more clarity
in score we have one list associated to each word. that's why three list in first row and 4 list in second row. And they are nothing but (positive, negative, neutral) score associated to each word. if the list empty we replace it with zero in the calculations.
CodePudding user response:
One possible solution is to create mapping-dictionaries with various rules (e.g. if positive, sum only first index (0
) etc.):
m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}
df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)
Prints:
text, pred score logits decision
0 No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
1 They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.1, 0.18] 0.81
EDIT: with ast.literal_eval
:
import pandas as pd
from ast import literal_eval
df = pd.DataFrame(
{
"text": ["No thank you", "They didnt respond me negative"],
"pred": ["positive", "negative"],
"score": [
"[[0, 0, 1], [1, 0, 2],[1, 0, 0]]",
"[[], [0, 1, 0], [], []]",
],
"logits": ["[0.01, 0.02, 0.97]", "[0.81, 0.10, 0.18]"],
}
)
df["score"] = df["score"].apply(literal_eval)
df["logits"] = df["logits"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}
df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)
Prints:
text pred score logits decision
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
1 They didnt respond me negative negative [[], [0, 1, 0], [], []] [0.81, 0.1, 0.18] 0.81