Home > OS >  Performing df.map() or df.merge() against a single <str> value to return a <float>
Performing df.map() or df.merge() against a single <str> value to return a <float>

Time:02-14

I am trying to create a list of 5 letter words in a pandas dataframe, which splits the word into different columns and assigns a value to each letter, then performs a summation of the values. The following code imports a .json dictionary and assigns values to a letter:

import json
import pandas as pd


def split(word):
    return [char for char in word]


j = open('words_dictionary.json')
dictionary = json.load(j)

dictionary_db = pd.DataFrame(columns=['Word'])
letter_db = pd.DataFrame(columns=['Letter'])

word_count = 0
num_word_count = 0
letter_count = 0

for i in dictionary:
    word_count  = 1
    if len(i) == 5:
        dictionary1_db = pd.DataFrame({i}, columns=['Word'])
        dictionary_db = pd.concat([dictionary1_db, dictionary_db], ignore_index=True, axis=0)
        num_word_count  = 1
        split_word = split(i)
        L1 = split_word[0]
        L2 = split_word[1]
        L3 = split_word[2]
        L4 = split_word[3]
        L5 = split_word[4]

        for s in split_word:
            letter_count  = 1
            letter1_db = pd.DataFrame({s}, columns=['Letter'])
            letter_db = pd.concat([letter_db, letter1_db], ignore_index=True, axis=0)

grouped = letter_db.groupby('Letter').value_counts()
grouped_db = pd.DataFrame(grouped, columns=['Value'])
grouped_db = grouped_db.apply(lambda x: (x/num_word_count)*.2, axis=1)
grouped_dict = grouped_db.to_dict()

Resulting in a grouped_db of:

Letter Value
0 a 0.10544
1 b 0.02625
2 c 0.03448
.. .. ..

as well as a similar dictionary:

grouped_dict = {'a': 0.10544, 'b': 0.02625, 'c': 0.03448, ...}

My problems start to occur when I try to map the string value to the float value.

How would I go about either merging a float value to the specified letter key or mapping a dictionary float value to the specified letter key without causing an 'NaN' value error?

ATTEMPT 1:

df = pd.DataFrame([{'L1': 'a'}])
df['L1Val'] = df['L1'].map(grouped_dict)

df:

L1 L1Val
0 a nan

intended df output:

L1 L1Val
0 a .10544

CodePudding user response:

You can merge the both dataframes as follows:

df.merge(grouped_db, left_on="L1", right_on="Letter")

The columns L1 and Letter will be redundant but you can filter one out afterwards.

  • Related