Fill date into columns-CodePudding

I have table:

user_id	date	days_since_install
001	01-01-2021	0
001	02-01-2021	1
001	02-01-2021	2

It is necessary to check if there is "1" in the column "days_since_install" in grouping by use_id and if so, fill in True in the column "retention_1d" otherwise False.

The resulting table should look like this:

user_id	retention_1d
001	True

CodePudding user response：

You can use groupby.first to get the first install per group, then map to map it per user_id:

# get first install value (if you have duplicates you would need to get the min)
d = df[df['event_type'].eq('install')].groupby(df['user_id'])['date'].first()

# map the values per user_id
df['install'] = df['user_id'].map(d)

output:

   user_id event_type        date     install
0        1    install  01-01-2021  01-01-2021
1        1      login  02-01-2021  01-01-2021
2        1      login  04-01-2021  01-01-2021

As a one liner:

df['install'] = df['user_id'].map(df[df['event_type'].eq('install')]
                                    .groupby(df['user_id'])['date'].first())

CodePudding user response：

Use Series.map by Series with filtered install without duplicates by user_id:

df['install'] = (df['user_id'].map(df[df['event_type'].eq('install')]
                                        .drop_duplicates('user_id')
                                        .set_index('user_id')['date']))
print (df)

   user_id event_type        date     install
0        1    install  01-01-2021  01-01-2021
1        1      login  02-01-2021  01-01-2021
2        1      login  04-01-2021  01-01-2021

CodePudding user response：

Is there case one id installs multiple times? then use groupby ffill

(df
.assign(install=df['date'].where(df['event_type'] == 'install'))
.assign(install=lambda x: x.groupby('user_id')['install'].ffill())

output:

user_id event_type  date    install
0   1   install 01-01-2021  01-01-2021
1   1   login   02-01-2021  01-01-2021
2   1   login   04-01-2021  01-01-2021