Home > Software engineering >  Find number of visits before a sale across 2 dataframes
Find number of visits before a sale across 2 dataframes

Time:05-10

I have two dataframes

df_visits shows the number of times the customer visited the store

  customer_id | date      | len_of_visit | location
    4893      |2021-02-12 |     20 min   |  1232
    4893      |2021-03-14 |     78 min   |  1232
    4893      |2021-06-02 |     23 min   |  1235
    4823      |2021-01-12 |     23 min   |  1232
    5531      |2021-02-12 |     20 min   |  1237

df_sales shows the first sales transactions made by each customer

  customer_id | date      | amount | method
    4893      |2021-03-14 | 143.00 |  credit
    4823      |2021-01-12 | 86.70  |  cash
    5531      |2021-02-24 | 255.00 |  credit

I want to find how many visits a customer made before completing a sale.

I want to merge the dataframes to limit the visits before each sale and count the number of visits.

The resulting dataframe should look like:

  customer_id | date      | amount | method  |number_of_visits
    4893      |2021-03-14 | 143.00 |  credit | 2
    4823      |2021-01-12 | 86.70  |  cash   | 1
    5531      |2021-02-24 | 255.00 |  credit | 1

How would I be able to do this?

CodePudding user response:

Try:

#convert to datetime if needed
df_visits["date"] = pd.to_datetime(df_visits["date"])
df_sales["date"] = pd.to_datetime(df_sales["date"])

#check if the visit was before the sale happened
df_visits["pre_sale"] = df_visits["date"].le(df_visits["customer_id"].map(dict(zip(df_sales["customer_id"],df_sales["date"]))))

#map the customer_id to the total number of visits pre sale
df_sales["number_of_visits"] = df_sales["customer_id"].map(df_visits.groupby("customer_id")["pre_sale"].sum())

>>> df_sales
   customer_id       date  amount  method  number_of_visits
0         4893 2021-03-14   143.0  credit                 2
1         4823 2021-01-12    86.7    cash                 1
2         5531 2021-02-24   255.0  credit                 1

CodePudding user response:

You can try merge the two dataframes on date column, then count the rows that df1['date'] less than df2['date']

df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])


out = df2.merge(df1[['customer_id', 'date']], on='customer_id', how='left')
d = out.groupby('customer_id').apply(lambda df: df['date_x'].le(df['date_y']).sum()).to_dict()
df2['number_of_visits'] = df2['customer_id'].map(d)
print(df2)

   customer_id       date  amount  method  number_of_visits
0         4893 2021-03-14   143.0  credit                 2
1         4823 2021-01-12    86.7    cash                 1
2         5531 2021-02-24   255.0  credit                 0

CodePudding user response:

This will allow you to get the visit count as well as well as make it expandable if needed to find the first visit number

df_visit['Visit_Count'] = df_visit.sort_values(['date'], ascending=True).groupby(['customer_id']).cumcount()   1
df_merge = pd.merge(df_visit, df_sales)
  • Related