I have the following data-frame:
ID date X
0 A 2021-12-15 7
1 A 2022-01-30 6
2 A 2022-02-15 2
3 B 2022-01-30 2
4 B 2022-02-15 2
5 B 2022-02-18 7
6 C 2021-12-01 7
7 C 2021-12-15 4
8 C 2022-01-30 2
9 C 2022-02-15 7
10 D 2021-12-16 5
11 D 2022-01-30 4
12 D 2022-03-15 9
I want to keep the observations for those IDs who first showed up in week, say, 51 of the year (I would like to change this parameter in the future).
For example, IDs A
and D
showed up first in week 51 in the data, B
didn't, C
showed up in week 51, but not for the first time.
So I want to keep in this example only the data pertaining to A
and D
.
CodePudding user response:
Filter if week
match variable week
and it is first time by ID
in DataFrame by Series.duplicated
, then get ID
values:
week = 50
df['date'] = pd.to_datetime(df['date'])
s = df.loc[df['date'].dt.isocalendar().week.eq(week) & ~df['ID'].duplicated(), 'ID']
Or:
df1 = df.drop_duplicates(['ID'])
s = df1.loc[df1['date'].dt.isocalendar().week.eq(week) ,'ID']
print (s)
0 A
10 D
Name: ID, dtype: object
Last filter by ID
with Series.isin
and boolean indexing
:
df = df[df['ID'].isin(s)]
print (df)
ID date X
0 A 2021-12-15 7
1 A 2022-01-30 6
2 A 2022-02-15 2
10 D 2021-12-16 5
11 D 2022-01-30 4
12 D 2022-03-15 9