Home > Mobile >  How to iterate a dataframe in every second or third id with Pandas?
How to iterate a dataframe in every second or third id with Pandas?

Time:10-06

I have a data frame, with id, username, and date. I sorted the data frame by id. How to make new data frames, that contains every second or third id? In my task the user names and the rows here are not relevant, just the id's. I need to get every second or third id in the same dataframe. If you see an ID in multiple times I need to count this as 1. My example is correct please check it.

Here is my code where I made a Data Frame and I sorted it by id:

import pandas as pd

id = ['11', '11', '11', '15', '15', '15', '23', '23', '25']
username = ['usera','userb','userc','userd','usere','userf','userd','usere','userf']
date = ['2021-05-04','2021-05-05','2021-05-05','2021-05-06','2021-06-07','2021-06-08','2021-07-09','2021-03-09','2021-04-10']

df = pd.DataFrame({'id': id, 'username': username, 'date': date})


dx = df.sort_values(by=['id'], ignore_index=True) #Sort because the dataframe not sorted. by default
print(dx) 

Here is some expected output:

  #dx = get every second value
   id username        date
0  11    usera  2021-05-04
1  11    userb  2021-05-05
2  11    userc  2021-05-05

6  23    userd  2021-07-09
7  23    usere  2021-03-09
....



# Get every third by id:

  id username        date
0  11    usera  2021-05-04
1  11    userb  2021-05-05
2  11    userc  2021-05-05
8  25    userf  2021-04-10
.....

CodePudding user response:

Try

  • Use np.unique to get the sorted unique ids
  • slice the array for every 2nd/3rd ids
  • use isin to slice the dataframe

Code:

unique_ids = np.unique(df['id'])

# every 2nd
every_2nd = df[df['id'].isin(unique_ids[::2])

# every third
every_3rd = df[df['id'].isin(unique_ids[::3])

CodePudding user response:

One possible solution:

# kick out non-unique IDs
ids = sorted(list(set(df.id.values.tolist())))

# get every 2nd ID (or 3rd, respectively)
ids_new = [i for i in ids[::2]]

# filter the dataframe accordingly
df_new = df[df.id.isin(ids_new)]
  • Related