In a jupyter notebook, I declare one variable from file:
with fits.open('mind_dataset/matrix_CEREBELLUM_large.fits') as data:
matrix_cerebellum = pd.DataFrame(data[0].data.byteswap().newbyteorder())
In the cells below, I have two methods:
neuronal_web_pixel = 0.32 # 1 micron => 10e-6 meters
def pixels_to_scale(df, mind=False, cosmos=False):
one_pixel_equals_micron = neuronal_web_pixel
brain_mask = (df != 0.0)
df[brain_mask] *= one_pixel_equals_micron
return df
and
def binarize_matrix(df, mind=False, cosmos=False):
brain_Llink = 16.0 # microns
zero_mask = (df != 0)
low_mask = (df <= brain_Llink)
df[low_mask & zero_mask] = 1.0
higher_mask = (df >= brain_Llink)
df[higher_mask] = 0.0
return df
Then I pass my variables to methods, to obtain scaled and binary dataframes:
matrix_cerebellum_scaled = pixels_to_scale(matrix_cerebellum, mind=True)
And:
matrix_cerebellum_binary = binarize_matrix(matrix_cerebellum_scaled, mind=True)
However, if I call 'matrix_cerebellum_scaled', now it points to 'matrix_cerebellum_binary' and I lose 'matrix_cerebellum_scaled' dataframe.
Why? what am I missing?
CodePudding user response:
Naming thing: those aren't methods, they're functions; now: if you modify a DataFrame within a function those changes still happen to the DataFrame. If you want a new DataFrame, declare it as a copy of the one being passed in.
At the very least at the top of binarize_matrix()
do: new_df = df.copy()
. More detail about why that's necessary in this SO answer and comments: https://stackoverflow.com/a/39628860/42346