thank you for looking into this with me!
I am trying to a calculated list to serve as 2 additional columns in an existing csv, however I struggle with preparing them as 2 columns.
MWE:
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
df = pd.read_csv('original.csv')
dtype_before = type(df["text"])
text_list = df["text"].tolist()
tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
list(map(sentiment_analyzer, text_list))
printing the list would result in this:
[[{'label': 'ポジティブ', 'score': 0.7804045081138611}], [{'label': 'ポジティブ', 'score': 0.9542087912559509}], [{'label': 'ポジティブ', 'score': 0.8557115793228149}], [{'label': 'ポジティブ', 'score': 0.9135494232177734}], [{'label': 'ポジティブ', 'score': 0.86244797706604}], [{'label': 'ネガティブ', 'score': 0.8266600370407104}], [{'label': 'ポジティブ', 'score': 0.9198371767997742}], [{'label': 'ポジティブ', 'score': 0.9033421874046326}], [{'label': 'ポジティブ', 'score': 0.7705154418945312}], [{'label': 'ポジティブ', 'score': 0.8205435872077942}], [{'label': 'ポジティブ', 'score': 0.8045720458030701}], [{'label': 'ネガティブ', 'score': 0.5160148739814758}], [{'label': 'ポジティブ', 'score': 0.8745550513267517}], [{'label': 'ポジティブ', 'score': 0.941367506980896}], [{'label': 'ポジティブ', 'score': 0.899341344833374}], [{'label': 'ポジティブ', 'score': 0.9200822710990906}], [{'label': 'ポジティブ', 'score': 0.6254457235336304}], [{'label': 'ポジティブ', 'score': 0.8494048714637756}], [{'label': 'ポジティブ', 'score': 0.6723847389221191}], [{'label': 'ポジティブ', 'score': 0.9329613447189331}], [{'label': 'ポジティブ', 'score': 0.9084392786026001}], [{'label': 'ポジティブ', 'score': 0.7804917693138123}], [{'label': 'ポジティブ', 'score': 0.6737139225006104}], [{'label': 'ネガティブ', 'score': 0.5254362225532532}], [{'label': 'ネガティブ', 'score': 0.7653219103813171}], [{'label': 'ネガティブ', 'score': 0.7342881560325623}], [{'label': 'ポジティブ', 'score': 0.8476402163505554}]]
I would like to achieve, getting 'label'
as one column header and 'score'
as the 2nd column header, so that the final 2 columns would look somewhat like this:
label column
ポジティブ 0.7804045081138611
ポジティブ 0.9542087912559509
ポジティブ 0.8557115793228149
...
ネガティブ 0.5160148739814758
I think once I achieve that, to add these columns to a csv I could use pandas right? So adding:
import csv
import re
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
df = pd.read_csv('original.csv')
dtype_before = type(df["text"])
text_list = df["text"].tolist()
tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
list(map(sentiment_analyzer, text_list))
<some magic to prepare results to a proper list>
df['label','score'] = <some magic to prepare results to a proper list>
df.to_csv("filepath.csv", index=False)
Thank you!
CodePudding user response:
Could you try;
label_score_flat = [dc for ls in map(sentiment_analyzer, text_list) for dc in ls]
df['label'] = [dc['label'] for dc inlabel_score_flat ]
df['score'] = [dc['score'] for dc inlabel_score_flat ]
df.to_csv("filepath.csv", index=False)
I have not tested so there might be bugs
CodePudding user response:
Please try this, if it helps:
import pandas as pd
x = [[{'label': 'ポジティブ', 'score': 0.7804045081138611}], [{'label': 'ポジティブ', 'score': 0.9542087912559509}], [{'label': 'ポジティブ', 'score': 0.8557115793228149}],
[{'label': 'ポジティブ', 'score': 0.9135494232177734}], [{'label': 'ポジティブ', 'score': 0.86244797706604}], [{'label': 'ネガティブ', 'score': 0.8266600370407104}],
[{'label': 'ポジティブ', 'score': 0.9198371767997742}], [{'label': 'ポジティブ', 'score': 0.9033421874046326}], [{'label': 'ポジティブ', 'score': 0.7705154418945312}],
[{'label': 'ポジティブ', 'score': 0.8205435872077942}], [{'label': 'ポジティブ', 'score': 0.8045720458030701}], [{'label': 'ネガティブ', 'score': 0.5160148739814758}],
[{'label': 'ポジティブ', 'score': 0.8745550513267517}], [{'label': 'ポジティブ', 'score': 0.941367506980896}], [{'label': 'ポジティブ', 'score': 0.899341344833374}],
[{'label': 'ポジティブ', 'score': 0.9200822710990906}], [{'label': 'ポジティブ', 'score': 0.6254457235336304}], [{'label': 'ポジティブ', 'score': 0.8494048714637756}],
[{'label': 'ポジティブ', 'score': 0.6723847389221191}], [{'label': 'ポジティブ', 'score': 0.9329613447189331}], [{'label': 'ポジティブ', 'score': 0.9084392786026001}],
[{'label': 'ポジティブ', 'score': 0.7804917693138123}], [{'label': 'ポジティブ', 'score': 0.6737139225006104}], [{'label': 'ネガティブ', 'score': 0.5254362225532532}],
[{'label': 'ネガティブ', 'score': 0.7653219103813171}], [{'label': 'ネガティブ', 'score': 0.7342881560325623}], [{'label': 'ポジティブ', 'score': 0.8476402163505554}]]
_label_vals = [_v for _ in x for _k, _v in _[0].items() if _k == 'label']
_score_vals = [_v for _ in x for _k, _v in _[0].items() if _k == 'score']
df1 = pd.DataFrame(list(zip(_label_vals, _score_vals)))
df1.columns = ['label', 'score']
print(df1)
df1.to_csv('StackOverflow.csv', index=False)