Home > Software design >  How to keep observations for individuals who showed up for the first time in week t in the data
How to keep observations for individuals who showed up for the first time in week t in the data

Time:04-14

I have the following data-frame:

   ID       date  X
0   A 2021-12-15  7
1   A 2022-01-30  6
2   A 2022-02-15  2
3   B 2022-01-30  2
4   B 2022-02-15  2
5   B 2022-02-18  7
6   C 2021-12-01  7
7   C 2021-12-15  4
8   C 2022-01-30  2
9   C 2022-02-15  7
10  D 2021-12-16  5
11  D 2022-01-30  4
12  D 2022-03-15  9

I want to keep the observations for those IDs who first showed up in week, say, 51 of the year (I would like to change this parameter in the future).

For example, IDs A and D showed up first in week 51 in the data, B didn't, C showed up in week 51, but not for the first time. So I want to keep in this example only the data pertaining to A and D.

CodePudding user response:

Filter if week match variable week and it is first time by ID in DataFrame by Series.duplicated, then get ID values:

week = 50

df['date'] = pd.to_datetime(df['date'])

s = df.loc[df['date'].dt.isocalendar().week.eq(week) & ~df['ID'].duplicated(), 'ID']

Or:

df1 = df.drop_duplicates(['ID'])
s = df1.loc[df1['date'].dt.isocalendar().week.eq(week) ,'ID']

print (s)
0     A
10    D
Name: ID, dtype: object

Last filter by ID with Series.isin and boolean indexing:

df = df[df['ID'].isin(s)]
print (df)
   ID       date  X
0   A 2021-12-15  7
1   A 2022-01-30  6
2   A 2022-02-15  2
10  D 2021-12-16  5
11  D 2022-01-30  4
12  D 2022-03-15  9
  • Related