How to calculate:
frequency of each word, present in another column of same row/index, i.e.
case_description_new | item_new |
---|---|
This row contains word_13 word_43 word_11 | word_11 word_12 word_13 |
This row contains word_31 word_34 word_22 | word_21 word_22 word_23 |
This row contains word_33 word_33 word_51 | word_31 word_32 word_33 |
Output:
case_description_new | item_new | items_frequency |
---|---|---|
This row contains word_13 word_43 word_11 | word_11 word_12 word_13 | word_11: 1, word_12: 0, word_13:1 |
This row contains word_31 word_34 word_22 | word_21 word_22 word_23 | word_21:0 word_22:1 word_23:0 |
This row contains word_33 word_33 word_51 | word_31 word_32 word_33 | word_31:0 word_32:0 word_33:2 |
Data
df_name = pd.DataFrame({
'case_description_new': ['This row contains word_13 word_43 word_11', 'This row contains word_31 word_34 word_22', 'This row contains word_33 word_33 word_51'],
'item_new': ['word_11 word_12 word_13', 'word_21 word_22 word_23', 'word_31 word_32 word_33']
})
CodePudding user response:
Solution with comprehension and str.count
df['freq'] = [{z: x.count(z) for z in y.split()} for x, y in df.to_numpy()[:, :2]]
case_description_new item_new freq
0 This row contains word_13 word_43 word_11 word_11 word_12 word_13 {'word_11': 1, 'word_12': 0, 'word_13': 1}
1 This row contains word_31 word_34 word_22 word_21 word_22 word_23 {'word_21': 0, 'word_22': 1, 'word_23': 0}
2 This row contains word_33 word_33 word_51 word_31 word_32 word_33 {'word_31': 0, 'word_32': 0, 'word_33': 2}