Home > Enterprise >  delete the first n rows of each ids in dataframe
delete the first n rows of each ids in dataframe

Time:03-10

I have a DataFrame, with two columns. I want to delete the first 3 rows values of each ids. If the id has less or equal to three rows, delete those rows also. Like in the following, the ids 3 and 1 have 3 and 2 rows, sod they should be deleted. for ids 4 and 2, only the rows 4, 5 are preserved.

import pandas as pd
df = pd.DataFrame()
df ['id'] = [4,4,4,4, 4,2, 2,2,2,2,3,3,3, 1, 1]
df ['value'] = [2,1,1,2, 3, 4, 6,-1,-2,2,-3,5,7, -2, 5]

Here is the DataFrame which I want.

enter image description here

CodePudding user response:

Number each "id" using groupby cumcount and filter the rows where the the number is more than 2:

out = df[df.groupby('id').cumcount() > 2]

Output:

   id  value
3   4      2
4   4      3
8   2     -2
9   2      2

CodePudding user response:

Use Series.value_counts and Series.map in order to performance a boolean indexing

new_df = df[df['id'].map(df['id'].value_counts().gt(2))]

   id  value
3   4      2
4   4      3
8   2     -2
9   2      2

CodePudding user response:

Using cumcount is the way but with drop work as well

out = df.groupby('id',sort=False).apply(lambda x : x.drop(x.index[:3])).reset_index(drop=True)
Out[12]: 
   id  value
0   4      2
1   4      3
2   2     -2
3   2      2
  • Related