I have two dataframes
df_visits
shows the number of times the customer visited the store
customer_id | date | len_of_visit | location
4893 |2021-02-12 | 20 min | 1232
4893 |2021-03-14 | 78 min | 1232
4893 |2021-06-02 | 23 min | 1235
4823 |2021-01-12 | 23 min | 1232
5531 |2021-02-12 | 20 min | 1237
df_sales
shows the first sales transactions made by each customer
customer_id | date | amount | method
4893 |2021-03-14 | 143.00 | credit
4823 |2021-01-12 | 86.70 | cash
5531 |2021-02-24 | 255.00 | credit
I want to find how many visits a customer made before completing a sale.
I want to merge the dataframes to limit the visits before each sale and count the number of visits.
The resulting dataframe should look like:
customer_id | date | amount | method |number_of_visits
4893 |2021-03-14 | 143.00 | credit | 2
4823 |2021-01-12 | 86.70 | cash | 1
5531 |2021-02-24 | 255.00 | credit | 1
How would I be able to do this?
CodePudding user response:
Try:
#convert to datetime if needed
df_visits["date"] = pd.to_datetime(df_visits["date"])
df_sales["date"] = pd.to_datetime(df_sales["date"])
#check if the visit was before the sale happened
df_visits["pre_sale"] = df_visits["date"].le(df_visits["customer_id"].map(dict(zip(df_sales["customer_id"],df_sales["date"]))))
#map the customer_id to the total number of visits pre sale
df_sales["number_of_visits"] = df_sales["customer_id"].map(df_visits.groupby("customer_id")["pre_sale"].sum())
>>> df_sales
customer_id date amount method number_of_visits
0 4893 2021-03-14 143.0 credit 2
1 4823 2021-01-12 86.7 cash 1
2 5531 2021-02-24 255.0 credit 1
CodePudding user response:
You can try merge the two dataframes on date
column, then count the rows that df1['date']
less than df2['date']
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])
out = df2.merge(df1[['customer_id', 'date']], on='customer_id', how='left')
d = out.groupby('customer_id').apply(lambda df: df['date_x'].le(df['date_y']).sum()).to_dict()
df2['number_of_visits'] = df2['customer_id'].map(d)
print(df2)
customer_id date amount method number_of_visits
0 4893 2021-03-14 143.0 credit 2
1 4823 2021-01-12 86.7 cash 1
2 5531 2021-02-24 255.0 credit 0
CodePudding user response:
This will allow you to get the visit count as well as well as make it expandable if needed to find the first visit number
df_visit['Visit_Count'] = df_visit.sort_values(['date'], ascending=True).groupby(['customer_id']).cumcount() 1
df_merge = pd.merge(df_visit, df_sales)