I have table:
user_id | date | days_since_install |
---|---|---|
001 | 01-01-2021 | 0 |
001 | 02-01-2021 | 1 |
001 | 02-01-2021 | 2 |
It is necessary to check if there is "1" in the column "days_since_install" in grouping by use_id and if so, fill in True in the column "retention_1d" otherwise False.
The resulting table should look like this:
user_id | retention_1d |
---|---|
001 | True |
CodePudding user response:
You can use groupby.first
to get the first install per group, then map
to map it per user_id:
# get first install value (if you have duplicates you would need to get the min)
d = df[df['event_type'].eq('install')].groupby(df['user_id'])['date'].first()
# map the values per user_id
df['install'] = df['user_id'].map(d)
output:
user_id event_type date install
0 1 install 01-01-2021 01-01-2021
1 1 login 02-01-2021 01-01-2021
2 1 login 04-01-2021 01-01-2021
As a one liner:
df['install'] = df['user_id'].map(df[df['event_type'].eq('install')]
.groupby(df['user_id'])['date'].first())
CodePudding user response:
Use Series.map
by Series
with filtered install
without duplicates by user_id
:
df['install'] = (df['user_id'].map(df[df['event_type'].eq('install')]
.drop_duplicates('user_id')
.set_index('user_id')['date']))
print (df)
user_id event_type date install
0 1 install 01-01-2021 01-01-2021
1 1 login 02-01-2021 01-01-2021
2 1 login 04-01-2021 01-01-2021
CodePudding user response:
Is there case one id installs multiple times?
then use groupby
ffill
(df
.assign(install=df['date'].where(df['event_type'] == 'install'))
.assign(install=lambda x: x.groupby('user_id')['install'].ffill())
output:
user_id event_type date install
0 1 install 01-01-2021 01-01-2021
1 1 login 02-01-2021 01-01-2021
2 1 login 04-01-2021 01-01-2021