Filtering dataframes based on one column with a different type of other column-CodePudding

I have the following problem

import pandas as pd

data = {
  "ID": [420, 380, 390, 540, 520, 50, 22],
  "duration": [50, 40, 45,33,19,1,3],
  "next":["390;50","880;222"  ,"520;50"  ,"380;111"   ,"810;111" ,"22;888" ,"11" ]

}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)

As you can see I have

    ID  duration     next
0  420        50   390;50
1  380        40  880;222
2  390        45   520;50
3  540        33  380;111
4  520        19  810;111
5   50         1   22;888
6   22         3       11

Things to notice:

ID type is int
next type is a string with numbers separated by ; if more than two numbers

I would like to filter the rows with no next in the ID

For example in this case

420 has a follow up in both 390 and 50
380 has as next 880 and 222 both of which are not in ID so this one
540 has as next 380 and 111 and while 111 is not in ID, 380 is so not this one
same with 50

In the end I want to get

1  380        40  880;222
4  520        19  810;111
6   22         3       11

With only one value I used print(df[~df.next.astype(int).isin(df.ID)]) but in this case isin can not be simply applied.

How can I do this?

CodePudding user response：

Let us try with split then explode with isin check

s = df.next.str.split(';').explode().astype(int)
out = df[~s.isin(df['ID']).groupby(level=0).any()]
Out[420]: 
    ID  duration     next
1  380        40  880;222
4  520        19  810;111
6   22         3       11

CodePudding user response：

Use a regex with word boundaries for efficiency:

pattern = '|'.join(df['ID'].astype(str))

out = df[~df['next'].str.contains(fr'\b(?:{pattern})\b')]

Output:

    ID  duration     next
1  380        40  880;222
4  520        19  810;111
6   22         3       11