It would be more easy to explain start from a simple example df:
df1:
A B C D
0 a 6 1 b/5/4
1 a 6 1 a/1/6
2 c 9 3 9/c/3
There were four columns in the df1(ABCD).The task is to find out columns D's strings appeared how many times in columnsABC(3coulumns)?Here is expect output and more explanation:
df2(expect output):
A B C D E (New column)
0 a 6 1 b/5/4 0 <--Found 0 ColumnD's Strings from ColumnABC
1 a 6 1 a/1/6 3 <--Found a、1 & 6 so it should return 3
2 c 9 3 9/c/3 3 <--Found all strings (3 totally)
Anyone has good idea for this? Thanks!
CodePudding user response:
You can use a list comprehension with set
operations:
df['E'] = [len(set(l).intersection(s.split('/'))) for l, s in
zip(df.drop(columns='D').astype(str).to_numpy().tolist(),
df['D'])]
Output:
A B C D E
0 a 6 1 b/5/4 0
1 a 6 1 a/1/6 3
2 c 9 3 9/c/3 3
CodePudding user response:
import pandas as pd
from pandas import DataFrame as df
dt = {'A':['a','a','c'], 'B': [6,6,9], 'C': [1,1,3], 'D':['b/5/4', 'a/1/6', 'c/9/3']}
E = []
nu_data =pd.DataFrame(data=dt)
for itxid, itx in enumerate(nu_data['D']):
match = 0
str_list = itx.split('/')
for keyid, keys in enumerate(dt):
if keyid < len(dt)-1:
for seg_str in str_list:
if str(dt[keys][itxid]) == seg_str:
match = 1
E.append(match)
nu_data['E'] = E
print(nu_data)