I have a pandas df like this:
MEMBERSHIP
[2022_K_, EWREW_NK]
[333_NFK_,2022_K_, EWREW_NK, 000]
And I have a list of keys:
list_k = ["_K_","_NK_","_NKF_","_KF_"]
I want to add and create a column that count if any of that element is in the column. The desired output is:
MEMBERSHIP | COUNT
[2022_K_, EWREW_NK] | 2
[333_NFK_,2022_K_, EWREW_NK, 000] | 3
Can you help me?
CodePudding user response:
IIUC, you can use pandas .str acccess methods with regex:
import pandas as pd
df = pd.DataFrame({'MEMBERSHIP':[['2022_K_', 'EWREW_NK'],
['333_NFK_','2022_K_', 'EWREW_NK', '000']]})
list_k = ["_K_","_NK","_NFK_","_KF_"] #I changed this list a little
reg = '|'.join(list_k)
df['count'] = df['MEMBERSHIP'].explode().str.contains(reg).groupby(level=0).sum()
print(df)
Output:
MEMBERSHIP count
0 [2022_K_, EWREW_NK] 2
1 [333_NFK_, 2022_K_, EWREW_NK, 000] 3
CodePudding user response:
you can use a lambda function:
def check(x):
total=0
for i in x:
if type(i) != str: #if value is not string pass.
pass
else:
for j in list_k:
if j in i:
total =1
return total
df['count']=df['MEMBERSHIP'].apply(lambda x: check(x))
CodePudding user response:
I come up with this dumb code
count_row=0
df['Count']= None
for i in df['MEMBERSHIP_SPLIT']:
count_element=0
for sub in i:
for e in list_k:
if e in sub:
count_element =1
df['Count'][count_row]=count_element
count_row = 1