Home > OS >  How to get every second or third id by grouped by users Pandas Python?
How to get every second or third id by grouped by users Pandas Python?

Time:10-09

I have a data frame with users like usera and userb, I need to group by this and each user has its own unique id. I need to get every second user id not by row by id. I managed to get every second id, but it is not good because there can be multiple users. Here is my code with inputs and with outputs:

import pandas as pd
import numpy as np


id = ['11', '11', '11', '15', '15', '15', '23', '23', '25','25','26','26','27','27','27','28','28']
username = ['usera','usera','usera','usera','usera','usera','usera','usera','usera','usera','userb','userb','userb','userb','userb','userb','userb']
date = ['2021-05-04','2021-05-05','2021-05-05','2021-05-06','2021-06-07','2021-06-08','2021-07-09','2021-03-09','2021-04-10','2021-04-10','2021-04-10','2021-04-10','2021-04-10','2021-04-10','2021-04-10','2021-04-10','2021-04-10']

df = pd.DataFrame({'id': id, 'username': username, 'date': date})


df = df.sort_values(by=['id'], ignore_index=True) #Sort because the dataframe not sorted.

# kick out non-unique IDs
unique_ids = np.unique(df['id'])



unique_ids = df.groupby('username')['id'].agg(['unique'])
print("g")
print(unique_ids)
print("gend")

print("g2")

otherframe = pd.DataFrame(unique_ids)
print(otherframe['unique'])



# every 2nd
print(unique_ids[::2])
print("\n\n head")
every_2nd = df[df['id'].isin(unique_ids[::2])]

#every_2nd get new dataframe with every second id grouped by users

#username        unique           
#usera     [11, 15, 23, 25] usera id-s
#userb         [26, 27, 28] userb id-s

#usera every second id= [11,  23 ]
#userb    every second id=     [26,  28] userb id-s


#expected ooutput
#every_second_id_by_user = ['11', '11', '11',  '23', '23', '26','26','27','27','27','28','28']
#and every second date=

CodePudding user response:

Edit: @Akshay Sehgal's solution is better.


If I understand the question correctly, I believe what you want can be achieved as:

df.groupby(['username', 'id'])['id'].unique()[::2]

# username  id
# usera     11    [11]
#           23    [23]
# userb     26    [26]
#           28    [28]
# Name: id, dtype: object

The key is to group by the username and id before taking the unique values.

CodePudding user response:

Try this -

df.groupby('username')['id'].unique().str[::2]
username
usera    [11, 23]
userb    [26, 28]
Name: id, dtype: object

If you want to further filter the original data frame for the rows by these ids, use this -

idx = df.groupby('username')['id'].unique().str[::2].explode()
df[df['id'].isin(idx)]
    id username        date
0   11    usera  2021-05-04
1   11    usera  2021-05-05
2   11    usera  2021-05-05
6   23    usera  2021-07-09
7   23    usera  2021-03-09
10  26    userb  2021-04-10
11  26    userb  2021-04-10
15  28    userb  2021-04-10
16  28    userb  2021-04-10

CodePudding user response:

With pd.factorize and np.mod after getting an indexer for id and username

df[np.mod(pd.factorize(df[['id','username']].to_records(index=False))[0],2)==0]

    id username        date
0   11    usera  2021-05-04
1   11    usera  2021-05-05
2   11    usera  2021-05-05
6   23    usera  2021-07-09
7   23    usera  2021-03-09
10  26    userb  2021-04-10
11  26    userb  2021-04-10
15  28    userb  2021-04-10
16  28    userb  2021-04-10
  • Related