how to create an algorithm for counting in pandas columns?-CodePudding

import pandas as pd

Create a dataframe for an example problem:

data = [[['A','B','C','D','E'],1000],
[['C','B'],2000],
[['X','Y','Z'],3000]]
df = pd.DataFrame(data=data, columns=['token','view'])

what does he look like:

In[0]:df
Out[0]:

              token view
0   [A, B, C, D, E] 1000
1   [C, B]          2000
2   [X, Y, Z]       3000

Now the main problem, I need to count the number of views for all unique values in each of the lists that are in the column df['token']. That is if the letter "B" is repeated in two rows with views 1000,2000 then I will get the sum of 3000. I created a code that counts the sum of these views, but it does not work correctly. Below I have shown an example of my code and what I would like to get

We create a list with all letters and convert it to a set and back to get only unique values without repetitions:

lst = []
for item in df['token']:
    for it in item:
        lst.append(it)
set_token = set(lst)
unique_token = list(set_token)

This is how it will look like:

In[1]:unique_token
Out[1]:
['D', 'E', 'Z', 'B', 'A', 'Y', 'C', 'X']

Now create a dictionary with keys from unique values and assign each 0 views as the initial value:

dict_token = {}
for item in unique_token:
    dict_token[item] = 0

This is how it will look like:

In[2]:dict_token
Out[2]:
{'D': 0, 'E': 0, 'Z': 0, 'B': 0, 'A': 0, 'Y': 0, 'C': 0, 'X': 0}

Now I created code to calculate the sum of views for each value in a unique dictionary if it occurs in a string. But I can't get the value of df['view'] for the same row that is currently being iterated over and I get a series, and an incorrect count, how to fix it? below is an option what I would like to get at the end

for item in dict_token.keys():
    for it in df['token']:
        if item in it:
            dict_token[item]  = df['view']

and i get:

In[3]:dict_token
Out[3]:
{'D': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64,
 'E': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64,
 'Z': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64,
 'B': 0     4000
 1     8000
 2    12000
 Name: view, dtype: int64,
 'A': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64,
 'Y': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64,
 'C': 0     4000
 1     8000
 2    12000
 Name: view, dtype: int64,
 'X': 0    2000
 1    4000
 2    6000
 Name: view, dtype: int64}

but i want:

Out[3]:{'D': 1000, 'E': 1000, 'Z': 3000, 'B': 3000, 'A': 1000, 'Y': 3000, 'C': 3000, 'X': 3000}

CodePudding user response：

You can explode your lists, then compute the groupby.sum and convert to_dict :

df.explode('token').groupby('token')['view'].sum().to_dict()

output: {'A': 1000, 'B': 3000, 'C': 3000, 'D': 1000, 'E': 1000, 'X': 3000, 'Y': 3000, 'Z': 3000}