Home > OS >  Counting Words from one columns of Dataframe to Another Dataframe column
Counting Words from one columns of Dataframe to Another Dataframe column

Time:03-31

I am having dataframe idf as below. I have another Dataframe df

idf

Output-

       feature_name idf_weights
2488    kralendijk  11.221923
3059    night       0
1383    ebebf       0
df

Output-

     message                   Number of Words in each message  
0   night kralendijk ebebf          3 

I want to add 'idf weights' from idf dataframe for each word in the "df" dataframe in a new column.

Output will look like below-

df

Output-

     message                   Number of Words in each message   Number of words with idf_score>0
0   night kralendijk ebebf                 3                     1

I tried counting up in the below code but it's not working. But it's giving total count of words instead of word idf_weight>0

Code-

words_weights = dict(idf[['feature_name', 'idf_weights']].values)
df['> zero'] = df['message'].apply(lambda x: count([words_weights.get(word, 11.221923) for word in x.split()]))

Output-

     message                   Number of Words in each message   Number of words with idf_score>0
0   night kralendijk ebebf                 3                     3

Thank you.

CodePudding user response:

Try using a list comprehension:

# set up a dictionary for easy feature->weight indexing
d = idf.set_index('feature_name')['idf_weights'].to_dict()
# {'kralendijk': 11.221923, 'night': 0.0, 'ebebf': 0.0}

df['> zero'] = [sum(d.get(w, 0)>0 for w in x.split()) for x in df['message']]

## OR, slighlty faster alternative
# df['> zero'] = [sum(1 for w in x.split() if d.get(w, 0)>0) for x in df['message']]

output:

                  message  Number of Words in each message  > zero
0  night kralendijk ebebf                                3       1

CodePudding user response:

You can use str.findall: the goal here is to create a list of feature names with a weight greater than 0 to find in each message.

pattern = fr"({'|'.join(idf.loc[idf['idf_weights'] > 0, 'feature_name'])})"
df['Number of words with idf_score>0'] = df['message'].str.findall(pattern).str.len()
print(df)

# Output
                  message  Number of Words in each message  Number of words with idf_score>0
0  night kralendijk ebebf                                3                                 1
  • Related