If I have the following data
Name | Start | End |
---|---|---|
A | 3/4/12 | 7/9/14 |
B | 5/2/17 | 6/3/18 |
C | 4/10/13 | 5/12/14 |
A | 4/6/13 | 7/12/15 |
B | 4/12/19 | 12/3/21 |
c | 12/6/13 | 11/3/14 |
For each unique name (A, B, C) I want to select the rows whose end dates fall later than the start date for that same name for every row other than the one whose end date is being considered . So, A and C in this case. Basically, first use groupby (['Name'] and then pick rows where the end date is later than the start date when comparison is made across rows for the same name.
CodePudding user response:
Convert both columns to datetimes, so possible compare for less by Series.lt
and then use GroupBy.all
for check if all True
s per Name
and last filter indices:
df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')
s = df['Start'].lt(df['End']).groupby(df['Name']).all()
out = s.index[s].tolist()
print (out)
Or change mask for Series.gt
and get differencies by numpy.setdiff1d
:
df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')
out = np.setdiff1d(df['Name'], df.loc[df['Start'].gt(df['End']), 'Name']).tolist()