I have a data frame, with id, username, and date. I sorted the data frame by id. How to make new data frames, that contains every second or third id? In my task the user names and the rows here are not relevant, just the id's. I need to get every second or third id in the same dataframe. If you see an ID in multiple times I need to count this as 1. My example is correct please check it.
Here is my code where I made a Data Frame and I sorted it by id:
import pandas as pd
id = ['11', '11', '11', '15', '15', '15', '23', '23', '25']
username = ['usera','userb','userc','userd','usere','userf','userd','usere','userf']
date = ['2021-05-04','2021-05-05','2021-05-05','2021-05-06','2021-06-07','2021-06-08','2021-07-09','2021-03-09','2021-04-10']
df = pd.DataFrame({'id': id, 'username': username, 'date': date})
dx = df.sort_values(by=['id'], ignore_index=True) #Sort because the dataframe not sorted. by default
print(dx)
Here is some expected output:
#dx = get every second value
id username date
0 11 usera 2021-05-04
1 11 userb 2021-05-05
2 11 userc 2021-05-05
6 23 userd 2021-07-09
7 23 usere 2021-03-09
....
# Get every third by id:
id username date
0 11 usera 2021-05-04
1 11 userb 2021-05-05
2 11 userc 2021-05-05
8 25 userf 2021-04-10
.....
CodePudding user response:
Try
- Use np.unique to get the sorted unique ids
- slice the array for every 2nd/3rd ids
- use
isin
to slice the dataframe
Code:
unique_ids = np.unique(df['id'])
# every 2nd
every_2nd = df[df['id'].isin(unique_ids[::2])
# every third
every_3rd = df[df['id'].isin(unique_ids[::3])
CodePudding user response:
One possible solution:
# kick out non-unique IDs
ids = sorted(list(set(df.id.values.tolist())))
# get every 2nd ID (or 3rd, respectively)
ids_new = [i for i in ids[::2]]
# filter the dataframe accordingly
df_new = df[df.id.isin(ids_new)]