How do I select rows based on comparisons of different column values across rows after applying grou-CodePudding

If I have the following data

Name	Start	End
A	3/4/12	7/9/14
B	5/2/17	6/3/18
C	4/10/13	5/12/14
A	4/6/13	7/12/15
B	4/12/19	12/3/21
c	12/6/13	11/3/14

For each unique name (A, B, C) I want to select the rows whose end dates fall later than the start date for that same name for every row other than the one whose end date is being considered . So, A and C in this case. Basically, first use groupby (['Name'] and then pick rows where the end date is later than the start date when comparison is made across rows for the same name.

CodePudding user response：

Convert both columns to datetimes, so possible compare for less by Series.lt and then use GroupBy.all for check if all Trues per Name and last filter indices:

df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')

s = df['Start'].lt(df['End']).groupby(df['Name']).all()

out = s.index[s].tolist()
print (out)

Or change mask for Series.gt and get differencies by numpy.setdiff1d:

df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')

out = np.setdiff1d(df['Name'], df.loc[df['Start'].gt(df['End']), 'Name']).tolist()