Home > database >  Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column "
Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column "

Time:11-22

I have a super large dataset that i'm trying to shrink. My idea is to keep 100 rows by neighborhood.

Here's an overview of my data :

index name neighborhood
0 name 1 neighborhood A
1 name 2 neighborhood A
2 name 3 neighborhood B
3 name 4 neighborhood B
4 name 5 neighborhood C
5 name 6 neighborhood C
6 name 7 neighborhood D
7 name 8 neighborhood D
8 name 9 neighborhood E
9 name 10 neighborhood E

What is the more efficient way to do so ?

Thanks in advance

I'm expecting to create something that looks like :

index name neighborhood
0 name 1 neighborhood A
1 name 3 neighborhood B
2 name 5 neighborhood C
3 name 7 neighborhood D
4 name 9 neighborhood E

CodePudding user response:

It depends how you want to select the rows.

first n with groupby.head:

n = 100
out = df.groupby('neighborhood').head(n)

random n rows with groupby.sample:

n = 100
out = df.groupby('neighborhood').sample(n=n)

CodePudding user response:

i think, you can use groupby and *nth:

dfx=df.groupby('neighborhood').nth[:100]
  • Related