Hello I have a pandas dataframe with an ID column and a value column.
I'd like to use vectorized code to "group" each ID and then run a function on each value of each row within the group. If every value for a specific ID returns False from the function I want to delete the ID and all corresponding rows from the original data frame.
def funct1(num):
num = num**.5
if num - int(num) == 0:
return True
else:
return False
import pandas as pd
df = {
'ID':['1','1','1','1','1','2','2','2','2','2','3','3','3','3','3'],
'Percentage':[7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]}
df = pd.DataFrame(df)
The end result should be:
df = {
'ID':['1','1','1','1','1','2','2','2','2','2'],
'Percentage':[7,8,9,10,11,12,13,14,15,16]}
df = pd.DataFrame(df)
Here we see that each row of ID 3 is deleted because there is no value within the group that has a int as a square root.
I also want to maintain the index because in the real data the values are not in order like this example.
The answer should return a df not a groupby object.
Thank you for the help!
Best
CodePudding user response:
You don't need to groupby
:
funct1 = lambda pct: pct.pow(0.5) - pct.pow(0.5).astype(int) == 0
out = df[df['ID'].isin(df.loc[funct1(df['Percentage']), 'ID'])]
>>> out
ID Percentage
0 1 7
1 1 8
2 1 9
3 1 10
4 1 11
5 2 12
6 2 13
7 2 14
8 2 15
9 2 16
Performance
# @BENY
%timeit df.groupby('ID').filter(lambda x : any((x['Percentage']**0.5).map(float.is_integer)))
1.08 ms ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# @Corralien
%timeit df[df['ID'].isin(df.loc[funct1(df['Percentage']), 'ID'])]
651 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
CodePudding user response:
In your case do groupby
with filter
out = df.groupby('ID').filter(lambda x : any((x['Percentage']**0.5).map(float.is_integer)))
Out[317]:
ID Percentage
0 1 7
1 1 8
2 1 9
3 1 10
4 1 11
5 2 12
6 2 13
7 2 14
8 2 15
9 2 16