Count element in list if it is present in each row of a column. Add to a new column (pandas)-CodePudding

I have a pandas df like this:

MEMBERSHIP
[2022_K_, EWREW_NK]
[333_NFK_,2022_K_, EWREW_NK, 000]

And I have a list of keys:

list_k = ["_K_","_NK_","_NKF_","_KF_"]

I want to add and create a column that count if any of that element is in the column. The desired output is:

MEMBERSHIP                        | COUNT
[2022_K_, EWREW_NK]               | 2
[333_NFK_,2022_K_, EWREW_NK, 000] | 3

Can you help me?

CodePudding user response：

IIUC, you can use pandas .str acccess methods with regex:

import pandas as pd
df = pd.DataFrame({'MEMBERSHIP':[['2022_K_', 'EWREW_NK'],
                                ['333_NFK_','2022_K_', 'EWREW_NK', '000']]})

list_k = ["_K_","_NK","_NFK_","_KF_"] #I changed this list a little
reg = '|'.join(list_k)
df['count'] = df['MEMBERSHIP'].explode().str.contains(reg).groupby(level=0).sum()
print(df)

Output:

                           MEMBERSHIP  count
0                 [2022_K_, EWREW_NK]      2
1  [333_NFK_, 2022_K_, EWREW_NK, 000]      3

CodePudding user response：

you can use a lambda function:

def check(x):
    total=0
    for i in x:
        if type(i) != str: #if value is not string pass.
            pass
        else:
            for j in list_k:
                if j in i:
                    total =1
    return total
                
df['count']=df['MEMBERSHIP'].apply(lambda x: check(x))

CodePudding user response：

I come up with this dumb code

count_row=0
df['Count']= None
for i in df['MEMBERSHIP_SPLIT']:
  count_element=0

  for sub in i:
    for e in list_k:
      if e in sub:
        count_element =1
        df['Count'][count_row]=count_element
  count_row  = 1