How to find the id which contains the specific name includes in it using python-CodePudding

I have a excel file containing three columns as shown below,

project_id	Name	Date
755	Waber	2019-04-08
755	Wabern	2020-06-16
755	Wabern (FTTH geplan)	2020-07-24
755	Wabern FTTH DTAG	2020-08-15
755	Wabern (FTTH DTAG gg)	2021-03-05
755	Wabern	2021-09-13
134	Lerbeck	2019-04-18
134	BAd oyehausen (FttH svd)	2020-06-26
134	Werre Park	2020-07-14
134	Werre Park (FTTH ssd)	2020-08-25
134	Werre Park (FTTH)	2021-03-15
134	Bad oyehausen	2021-09-23
584	kitern	2019-04-08
584	Lausen (ftth los)	2020-06-16
584	kitener (FTTH geplan)	2020-07-24
584	Lausern	2020-08-15
584	Lausern (FTTH DTAG gg)	2021-03-05

Is it possible to filter out the project_id for which the name does not include any form of (FTTH) in the first row of the same id and at the last row of the same id. For example the first project_id 755 in the first row doesn't have any form of (FTTH) included in it and also the last row of the same project_id doesn't have the text (fttH), but for the project_id 584 it is a different scenario. Is it possible to filter out the project_id which doesn't include FTTH in any form in the first and last row of the same project_id ?

Expecting the result to be like,

project_id
755
134

CodePudding user response：

By looping over the unique id, and each time get the all name by specific id, after keep only first and last name, convert it to str and check there is any word 'FTTH'

Code:

import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]

CodePudding user response：

Using pandas:

def custom_function(series, pattern='FTTH'):
    """Identify if the first and last items do not have a pattern"""
    first = pattern not in series.iat[0].upper()
    last = pattern not in series.iat[-1].upper()
    return first and last

df.groupby('project_id').Name.apply(custom_function)

Output:

project_id
134     True
584    False
755     True
Name: Name, dtype: bool