Home > Enterprise >  How to extract data based on user_id and purchase month?
How to extract data based on user_id and purchase month?

Time:12-02

I have a dataset from which I need to extract user and user data who purchased in two months Jan and Feb. Could you please help me with the code?

enter image description here

The desired output should be like this below

enter image description here

CodePudding user response:

Filter:

df[(df["Purchase month"]=="Jan") | (df["Purchase month"]=="Feb")]

Sort:

df.sort_values(by=["user_id"])

CodePudding user response:

First, we can use a groupby to get the customer buying in at least 2 different months :

>>> df_grouped = df.groupby(['user_id'], as_index=False)['purchase_month'].nunique()
>>> valid_users = df_grouped[df_grouped['purchase_month']>=2]['user_id'].tolist()
>>> valid_users
[1, 3]

Then we can filter the first DataFrame with those users to get the expected result :

>>> df[df["user_id"].isin(valid_users)].sort_values(by=["user_id"])
    user_id purchase_month  product
0   1       jan             football
3   1       feb             bed
2   3       feb             toaster
4   3       jan             printer

CodePudding user response:

Try:

  1. groupby and create a list of months for each User id
  2. Only keep rows where months include both Jan and Feb i.e. the set {"Jan", "Feb"} is a subset of months
months = df.groupby("User id")["Purchase month"].agg(list)
output = df[df["User id"].isin(months[months.map({"Jan", "Feb"}.issubset)].index)]

>>> output
   User id Purchase month   Product
0        1            Jan  Football
2        3            Feb   Toaster
3        1            Feb       Bed
4        3            Jan   Printer
  • Related