So, I now have this code:
import pandas as pd
diccionario = pd.read_csv(dict, header=0).set_index("nombre")["valor"].to_dict()
lista = ["felicidad", "paz", "día", "estoy", "isla", "alivio", "-"]
print(sum([diccionario[i] for i in lista if i in diccionario]))
that allow me to compare the words from "lista" with a key:value dictionary (dictionario), and I get as result 8.
But, now I would like to read "lista" also from the same google sheet, in order to be able to add more lists.
So, here I added the new lists here:
b = f"https://docs.google.com/spreadsheets/d/1-odw996EIUB9mo2Ad1fNh0y9QiXv7GU81COMj6g1Z-A/gviz/tq?tqx=out:csv&sheet=b"
and read it
import pandas as pd
text = pd.read_csv(b, header=0)
Then, I tokenize the phrases with NLTK
regexp = RegexpTokenizer('\w ')
text['text_token']=text['frases'].apply(regexp.tokenize)
But once I use the same code, I dont recieve a new column with the evaluation of each phrase
text['suma']=(diccionariob[diccionariob['nombre'].isin(text['frases'])]['valor'].sum())
print(sum([diccionario[i] for i in lista if i in diccionario]))
Instead, all I got are zeros.
frases | text_token | suma |
---|---|---|
hola la casa es bonita paz felicidad | [hola, la, casa, es, bonita, paz, felicidad] | 0 |
pasos de gigante feliz alegria mejor paz | [pasos, de, gigante, feliz, alegria, mejor] | 0 |
estás muy bien paz | [estás, muy, bien, paz] | 0 |
mucha felicidad paz | [mucha, felicidad, paz] | 0 |
What am I missing. Thanks!
CodePudding user response:
you can try to wrap your first method into a lambda function (assuming you have all the tokenized words inside a dictionary called diccionariob)
#pass dictionary and the tokenized list as parameters
def somma(dictionary, lista):
somma = sum([dictionary[i] for i in lista if i in dictionary])
return somma
#apply the function to each row of the dataframe
text['suma']= text.apply(lambda x: somma(diccionariob, x['frases']), axis=1)