Home > Back-end >  Calculating mean values based on numpy array mask, getting the mean of all values at the location in
Calculating mean values based on numpy array mask, getting the mean of all values at the location in

Time:07-11

I've built a mask to point at certain locations in my dataframe. I want to be able to calculate the the mean of all values at the locations in the mask, and then the mean of all values outside of the mask. Here is my current code:

mask_number = 5
no_overload_cycles = 2
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})

list_test = []
for i in range(0,len(hyst)-1,mask_number):
    for x in range(no_overload_cycles):
        list_test.append(i x)
    
mask = np.array(list_test)

print(mask)

[0 1 5 6 10 11 15 16]

overload_mean = (hyst.loc[mask,'test']).mean() 
baseline_mean = (hyst.loc[~mask,'test']).mean() 

Basically, I want overload_mean to be the mean of all values in test that are located at the mask, and baseline_mean to be the mean of all values that are in test that are not referenced by the mask. My current code gives me this error:

baseline_mean = (hyst.loc[~mask,'test']).mean() 

KeyError: "None of [Int64Index([    -1,     -2,     -6,     -7,    -11,    -12,    -16,           dtype='int64', length=11858)] are in the [index]"

I thought I could use the tilde to reference every result not in the mask? Any help here would be greatly appreciated!

CodePudding user response:

mask is an array of integers, so ~mask is essentially -mask and you can't do .loc[-mask] because these negative numbers are not in the index. You want:

overload = hyst.iloc[mask]

overload_mean = overload['test'].mean()
baseline_mean = hyst.drop(overload.index)['test'].mean()

print(overload_mean, baseline_mean)

Output:

4.5 4.818181818181818

Note if your data is range index, you can do:

overload_mean = hyst.loc[mask, 'test'].mean()
baseline_mean = hyst.drop(mask)['test'].mean()

Note 2: Or, you can convert mask into a boolean array, and your code would work:

mask = np.isin(hyst.index, list_test)

overload_mean = (hyst.loc[mask,'test']).mean() 
baseline_mean = (hyst.loc[~mask,'test']).mean() 
  • Related