Group by and update based on condition python pandas-CodePudding

Hello I have a pandas dataframe with an ID column and a value column.

I'd like to use vectorized code to "group" each ID and then run a function on each value of each row within the group. If every value for a specific ID returns False from the function I want to delete the ID and all corresponding rows from the original data frame.

def funct1(num):
    num = num**.5
    if num - int(num) == 0:
        return True
    else:
        return False

import pandas as pd
df = {
   'ID':['1','1','1','1','1','2','2','2','2','2','3','3','3','3','3'],
   'Percentage':[7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]}
df = pd.DataFrame(df)

The end result should be:

df = {
   'ID':['1','1','1','1','1','2','2','2','2','2'],
   'Percentage':[7,8,9,10,11,12,13,14,15,16]}
df = pd.DataFrame(df)

Here we see that each row of ID 3 is deleted because there is no value within the group that has a int as a square root.

I also want to maintain the index because in the real data the values are not in order like this example.

The answer should return a df not a groupby object.

Thank you for the help!

Best

CodePudding user response：

You don't need to groupby:

funct1 = lambda pct: pct.pow(0.5) - pct.pow(0.5).astype(int) == 0
out = df[df['ID'].isin(df.loc[funct1(df['Percentage']), 'ID'])]

>>> out
  ID  Percentage
0  1           7
1  1           8
2  1           9
3  1          10
4  1          11
5  2          12
6  2          13
7  2          14
8  2          15
9  2          16

Performance

# @BENY
%timeit df.groupby('ID').filter(lambda x : any((x['Percentage']**0.5).map(float.is_integer)))
1.08 ms ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# @Corralien
%timeit df[df['ID'].isin(df.loc[funct1(df['Percentage']), 'ID'])]
651 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response：

In your case do groupby with filter

out = df.groupby('ID').filter(lambda x : any((x['Percentage']**0.5).map(float.is_integer)))
Out[317]: 
  ID  Percentage
0  1           7
1  1           8
2  1           9
3  1          10
4  1          11
5  2          12
6  2          13
7  2          14
8  2          15
9  2          16