I am trying to create a list of 5 letter words in a pandas dataframe, which splits the word into different columns and assigns a value to each letter, then performs a summation of the values. The following code imports a .json dictionary and assigns values to a letter:
import json
import pandas as pd
def split(word):
return [char for char in word]
j = open('words_dictionary.json')
dictionary = json.load(j)
dictionary_db = pd.DataFrame(columns=['Word'])
letter_db = pd.DataFrame(columns=['Letter'])
word_count = 0
num_word_count = 0
letter_count = 0
for i in dictionary:
word_count = 1
if len(i) == 5:
dictionary1_db = pd.DataFrame({i}, columns=['Word'])
dictionary_db = pd.concat([dictionary1_db, dictionary_db], ignore_index=True, axis=0)
num_word_count = 1
split_word = split(i)
L1 = split_word[0]
L2 = split_word[1]
L3 = split_word[2]
L4 = split_word[3]
L5 = split_word[4]
for s in split_word:
letter_count = 1
letter1_db = pd.DataFrame({s}, columns=['Letter'])
letter_db = pd.concat([letter_db, letter1_db], ignore_index=True, axis=0)
grouped = letter_db.groupby('Letter').value_counts()
grouped_db = pd.DataFrame(grouped, columns=['Value'])
grouped_db = grouped_db.apply(lambda x: (x/num_word_count)*.2, axis=1)
grouped_dict = grouped_db.to_dict()
Resulting in a grouped_db of:
Letter | Value | |
---|---|---|
0 | a | 0.10544 |
1 | b | 0.02625 |
2 | c | 0.03448 |
.. | .. | .. |
as well as a similar dictionary:
grouped_dict = {'a': 0.10544, 'b': 0.02625, 'c': 0.03448, ...}
My problems start to occur when I try to map the string value to the float value.
How would I go about either merging a float value to the specified letter key or mapping a dictionary float value to the specified letter key without causing an 'NaN' value error?
ATTEMPT 1:
df = pd.DataFrame([{'L1': 'a'}])
df['L1Val'] = df['L1'].map(grouped_dict)
df:
L1 | L1Val | |
---|---|---|
0 | a | nan |
intended df output:
L1 | L1Val | |
---|---|---|
0 | a | .10544 |
CodePudding user response:
You can merge the both dataframes as follows:
df.merge(grouped_db, left_on="L1", right_on="Letter")
The columns L1 and Letter will be redundant but you can filter one out afterwards.