I've built a mask to point at certain locations in my dataframe. I want to be able to calculate the the mean of all values at the locations in the mask, and then the mean of all values outside of the mask. Here is my current code:
mask_number = 5
no_overload_cycles = 2
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
list_test = []
for i in range(0,len(hyst)-1,mask_number):
for x in range(no_overload_cycles):
list_test.append(i x)
mask = np.array(list_test)
print(mask)
[0 1 5 6 10 11 15 16]
overload_mean = (hyst.loc[mask,'test']).mean()
baseline_mean = (hyst.loc[~mask,'test']).mean()
Basically, I want overload_mean
to be the mean of all values in test
that are located at the mask, and baseline_mean
to be the mean of all values that are in test
that are not referenced by the mask. My current code gives me this error:
baseline_mean = (hyst.loc[~mask,'test']).mean()
KeyError: "None of [Int64Index([ -1, -2, -6, -7, -11, -12, -16, dtype='int64', length=11858)] are in the [index]"
I thought I could use the tilde to reference every result not in the mask? Any help here would be greatly appreciated!
CodePudding user response:
mask
is an array of integers, so ~mask
is essentially -mask
and you can't do .loc[-mask]
because these negative numbers are not in the index. You want:
overload = hyst.iloc[mask]
overload_mean = overload['test'].mean()
baseline_mean = hyst.drop(overload.index)['test'].mean()
print(overload_mean, baseline_mean)
Output:
4.5 4.818181818181818
Note if your data is range index, you can do:
overload_mean = hyst.loc[mask, 'test'].mean()
baseline_mean = hyst.drop(mask)['test'].mean()
Note 2: Or, you can convert mask
into a boolean array, and your code would work:
mask = np.isin(hyst.index, list_test)
overload_mean = (hyst.loc[mask,'test']).mean()
baseline_mean = (hyst.loc[~mask,'test']).mean()