I have a function that receives a whole entry of a multiindex that returns true if or false for the entire index. Hereby I am feeding several columns of the entry as a key value pair e.g.:
temp = cells.loc[0]
x = temp.set_index(['eta','phi'])['e'].to_dict()
filter_frame(x,20000) # drop event if this function returns false
So far I only found examples where people want to remove single rows but I am talking an entire entry with several hundred subentries, as all subentries are used to output the boolean. How can I drop entries that dont fulfill this condition?
The filter_frame() function would just produce a true or false for this entry 0, which contains 780 rows. The function also works fine, I just dont know how to apply it without doing slow for loops. What I am looking for is something like this
cells = cells[apply the filter function somehow for all entries]
and have a significantly smaller dataframe
Edit2: print(mask) of jezraels solution:
CodePudding user response:
Frst call function per first level of MultiIndex
in GroupBy.apply
- get mask per groups, so for filtering original DataFrame use MultiIndex.droplevel
for remove second level with mapping by Index.map
, so possible filtering in boolean indexing
:
def f(temp):
x = temp.set_index(['eta','phi'])['e'].to_dict()
return filter_frame(x,20000)
mask = cells.index.droplevel(1).map(cells.groupby(level=0).apply(f))
out = cells[mask]