Home > database >  How do I evaluate sentences from a dataframe instead of a single list
How do I evaluate sentences from a dataframe instead of a single list

Time:01-04

So, I now have this code:

import pandas as pd

diccionario = pd.read_csv(dict, header=0).set_index("nombre")["valor"].to_dict()

lista = ["felicidad", "paz", "día", "estoy", "isla", "alivio", "-"]
print(sum([diccionario[i] for i in lista if i in diccionario]))

that allow me to compare the words from "lista" with a key:value dictionary (dictionario), and I get as result 8.

But, now I would like to read "lista" also from the same google sheet, in order to be able to add more lists.

So, here I added the new lists here:

b = f"https://docs.google.com/spreadsheets/d/1-odw996EIUB9mo2Ad1fNh0y9QiXv7GU81COMj6g1Z-A/gviz/tq?tqx=out:csv&sheet=b"

and read it

import pandas as pd

text = pd.read_csv(b, header=0)

Then, I tokenize the phrases with NLTK

regexp = RegexpTokenizer('\w ')

text['text_token']=text['frases'].apply(regexp.tokenize)

But once I use the same code, I dont recieve a new column with the evaluation of each phrase

text['suma']=(diccionariob[diccionariob['nombre'].isin(text['frases'])]['valor'].sum())
print(sum([diccionario[i] for i in lista if i in diccionario]))

Instead, all I got are zeros.

frases text_token suma
hola la casa es bonita paz felicidad [hola, la, casa, es, bonita, paz, felicidad] 0
pasos de gigante feliz alegria mejor paz [pasos, de, gigante, feliz, alegria, mejor] 0
estás muy bien paz [estás, muy, bien, paz] 0
mucha felicidad paz [mucha, felicidad, paz] 0

What am I missing. Thanks!

CodePudding user response:

you can try to wrap your first method into a lambda function (assuming you have all the tokenized words inside a dictionary called diccionariob)

#pass dictionary and the tokenized list as parameters
def somma(dictionary, lista):
    somma = sum([dictionary[i] for i in lista if i in dictionary])
    return somma

#apply the function to each row of the dataframe
text['suma']= text.apply(lambda x: somma(diccionariob, x['frases']), axis=1)
  • Related