I have a excel file containing three columns as shown below,
ID | Name | Date |
---|---|---|
1136 | GG Daubringen | 2019-04-08 |
1136 | Manheinm | 2020-06-16 |
1136 | Manheinm Streitkof | 2020-07-24 |
1136 | Staufenberg | 2020-08-15 |
1136 | Stürcken | 2021-03-05 |
1136 | Stürcken (FTTH) | 2021-09-13 |
1234 | Lerbeck | 2019-04-18 |
1234 | BAd oyehausen | 2020-06-26 |
1234 | Werre Park | 2020-07-14 |
1234 | Werre Park (FTTH) | 2020-08-25 |
1234 | Werre Park (FTTH) | 2021-03-15 |
1234 | Bad oyehausen | 2021-09-23 |
Is it possible to filter out the ID for which the name starts without (FTTH) and end the name without (FTHH), For example the first 1st ID 1136 has the name doesn't start with (FTTH) but ends with (FTTH), but I want to filter out the ID which doesn't start with (FTTH) and also doesn't end with (FTTH) as in the ID 1234 , is it possible to filter it using python ??
Expecting the result to be like,
ID |
---|
1234 |
CodePudding user response:
You can aggregate GroupBy.first
GroupBy.last
first and then filter indices for Series.str.startswith
and Series.str.endswith
:
df1 = df.groupby('ID')['Name'].agg(['first','last'])
i = df1.index[~df1['first'].str.startswith('(FTTH)') & ~df1['last'].str.endswith('(FTTH)')]
print (i)
#Int64Index([1234], dtype='int64', name='ID')
If need values in list:
L = i.tolist()
[1234]
If need DataFrame:
out = i.to_frame(index=False)
print (out)
ID
0 1234
If need first
without (FTTH)
and last
with (FTTH)
use:
i = df1.index[~df1['first'].str.startswith('(FTTH)') & df1['last'].str.endswith('(FTTH)')]
CodePudding user response:
s = ~df['Name'].str.endswith('(FTTH)')
s.groupby(df['ID']).agg(['first', 'last']).all(axis=1)[lambda x:x].reset_index()[['ID']]
output:
ID
0 1234