Home > Mobile >  Dataframe query rows where date column are the same as name column
Dataframe query rows where date column are the same as name column

Time:11-04

I have a dataframe looking like this:

user user
2022-08-01 Andy
2022-08-04 Mary
2022-08-05 Marc
2022-08-01 Frank

Now I want to pick up that rows where Andy an Frank are Users at the same date. The date is the unknown part. It can be many more rows and many more dates where that two users are both in the result with different dates. How can I do that with dataframe query?

I am new at python. Can anybody help me out?

CodePudding user response:

df = pd.DataFrame({'date':['2022-08-01','2022-08-04','2022-08-05','2022-08-01'],
                   'user':['Andy','Mary','Marc','Frank']})

You can do a groupby date and join the user names as a string (sorted by their names, thus there won't be any error.)

df_grouped = df.groupby('date').user.apply(lambda x: ','.join(sorted(list(x)))).reset_index()

Then you can get the dates;

dates = df_grouped[df_grouped.user==','.join(sorted(['Andy','Frank']))]['date'].values

Then you can fitler the results accordinly;

df_selected = df[df['date'].isin(dates)]

CodePudding user response:

I would treat this as a set operation: filter the dates for Andy, filter the dates for Frank, and compute the intersection:

dates = set(df.date[df.user == 'Andy']).intersection(df.date[df.user == 'Frank'])

If you are looking for a more pandas/database style solution, this is a simple kind of join, which can be done in pandas with merge:

dates = df[df.user == 'Andy'].merge(df[df.user == 'Frank'], on='date')['date']

However, note that if there are date duplicates in both of the filtered tables, then the number of those dates will multiply in the result of this method, which is one reason why I would prefer the set approach.

  • Related