I have a excel file containing three columns as shown below,
project_id | Name | Date |
---|---|---|
755 | Waber | 2019-04-08 |
755 | Wabern | 2020-06-16 |
755 | Wabern (FTTH geplan) | 2020-07-24 |
755 | Wabern FTTH DTAG | 2020-08-15 |
755 | Wabern (FTTH DTAG gg) | 2021-03-05 |
755 | Wabern | 2021-09-13 |
134 | Lerbeck | 2019-04-18 |
134 | BAd oyehausen (FttH svd) | 2020-06-26 |
134 | Werre Park | 2020-07-14 |
134 | Werre Park (FTTH ssd) | 2020-08-25 |
134 | Werre Park (FTTH) | 2021-03-15 |
134 | Bad oyehausen | 2021-09-23 |
584 | kitern | 2019-04-08 |
584 | Lausen (ftth los) | 2020-06-16 |
584 | kitener (FTTH geplan) | 2020-07-24 |
584 | Lausern | 2020-08-15 |
584 | Lausern (FTTH DTAG gg) | 2021-03-05 |
Is it possible to filter out the project_id for which the name does not include any form of (FTTH) in the first row of the same id and at the last row of the same id. For example the first project_id 755 in the first row doesn't have any form of (FTTH) included in it and also the last row of the same project_id doesn't have the text (fttH), but for the project_id 584 it is a different scenario. Is it possible to filter out the project_id which doesn't include FTTH in any form in the first and last row of the same project_id ?
Expecting the result to be like,
project_id |
---|
755 |
134 |
CodePudding user response:
By looping over the unique id, and each time get the all name by specific id, after keep only first and last name, convert it to str and check there is any word 'FTTH'
Code:
import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]
CodePudding user response:
Using pandas:
def custom_function(series, pattern='FTTH'):
"""Identify if the first and last items do not have a pattern"""
first = pattern not in series.iat[0].upper()
last = pattern not in series.iat[-1].upper()
return first and last
df.groupby('project_id').Name.apply(custom_function)
Output:
project_id
134 True
584 False
755 True
Name: Name, dtype: bool