I would like to ask a question about how to create new column names for an existing data frame from a list of column names. I was counting verb frequencies in each string in a data frame. The verb list looks as below:
<bound method DataFrame.to_dict of verb
0 agree
1 bear
2 care
3 choose
4 be>
The code below works but the output is the total frequencies of all the words, instead of creating column names for each word in a word list.
#ver.1 code
import pandas as pd
verb = pd.read_csv('cog_verb.csv')
df2 = pd.DataFrame(df.answer_id)
for x in verb:
df2[f'count_{x}'] = lemma.str.count('|'.join(r"\b{}\b".format(x)))
The code was updated reflecting the helpful comment by Drakax, as below:
#updated code
for x in verb:
df2.to_dict()[f'count_{x}'] = lemma.str.count('|'.join(r"\b{}\b".format(x)))
but both of the codes produced the same following output:
<bound method DataFrame.to_dict of answer_id count_verb
0 312 91
1 1110 123
2 2700 102
3 2764 217
4 2806 182
.. ... ...
321 33417 336
322 36558 517
323 37316 137
324 37526 119
325 45683 1194
[326 rows x 2 columns]>
----- updated info----
As advised by Drakax, I add the first data frame below.
df.to_dict
<bound method DataFrame.to_dict of answer_id text
0 312 ANON_NAME_0\n Here are a few instructions for ...
1 1110 October16,2006 \nDear Dad,\n\n I am going to g...
2 2700 My Writing Habits\n I do many things before I...
3 2764 My Ideas about Writing\n I have many ideas bef...
4 2806 I've main habits for writing and I sure each o...
.. ... ...
321 33417 ????????????????????????\n???????????????? ?? ...
322 36558 In this world, there are countless numbers of...
323 37316 My Friend's Room\nWhen I was kid I used to go ...
324 37526 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ...
325 45683 Primary and Secondary Education in South Korea...
[326 rows x 2 columns]>
While the above output is correct, I want each word's frequency data as applied to each column. I appreciate any help you can provide. Many thanks in advance!
CodePudding user response:
Well it seems to still be a mess but I think I've understood what you want and you can adapt/update your code with mine:
1. This step is only for me; creating new DF with randomly generated str:
from pandas._testing import rands_array
randstr = pd.util.testing.rands_array(10, 10)
df = pd.DataFrame(data=randstr, columns=["randstr"])
df
index | randstr | count |
---|---|---|
0 | 20uDmHdBL5 | 1 |
1 | E62AeycGdy | 1 |
2 | tHz99eI8BC | 1 |
3 | iZLXfs7R4k | 1 |
4 | bURRiuxHvc | 2 |
5 | lBDzVuB3z9 | 1 |
6 | GuIZHOYUr5 | 1 |
7 | k4wVvqeRkD | 1 |
8 | oAIGt8pHbI | 1 |
9 | N3BUMfit7a | 2 |
2. Then to count the occurrences of your desired regex simply do this:
reg = ['a','e','i','o','u'] #this is where you stock your verbs
def count_reg(df):
for i in reg:
df[i] = df['randstr'].str.count(i)
return df
count_reg(df)
index | randstr | a | e | i | o | u |
---|---|---|---|---|---|---|
0 | h2wcd5yULo | 0 | 0 | 0 | 1 | 0 |
1 | uI400TZnJl | 0 | 0 | 0 | 0 | 1 |
2 | qMiI7morYG | 0 | 0 | 1 | 1 | 0 |
3 | f6Aw6AH3TL | 0 | 0 | 0 | 0 | 0 |
4 | nJ0h9IsDn6 | 0 | 0 | 0 | 0 | 0 |
5 | tWyNxnzLwv | 0 | 0 | 0 | 0 | 0 |
6 | V4sTYcPsiB | 0 | 0 | 1 | 0 | 0 |
7 | tSgni67247 | 0 | 0 | 1 | 0 | 0 |
8 | sUZn3L08JN | 0 | 0 | 0 | 0 | 0 |
9 | qDiG3Zynk0 | 0 | 0 | 1 | 0 | 0 |
Please accept ✅ this answer if it solved your problem :)
Otherwise mention me (using @) in comment while telling me what's wrong ;)