I do not know why it can not pass the validation of each variable.
marvel_df = rate_df.loc[rate_df['Company']== "Marvel"]
mean_marvel =marvel_df[['Rate']].mean()
std_marvel =marvel_df[['Rate']].std()
n_marvel = marvel_df[['Rate']].count()
dc_df = rate_df.loc[rate_df['Company']== "DC"]
mean_dc =dc_df[['Rate']].mean()
std_dc =dc_df[['Rate']].std()
n_dc =dc_df[['Rate']].count()
# Validation
assert n_marvel == 23
assert n_dc == 16
assert np.ndim(mean_marvel) == np.ndim(mean_dc) == np.ndim(std_marvel) == np.ndim(std_dc) == np.ndim(n_marvel) == np.ndim(n_dc)
CodePudding user response:
Try it with this cleaned up code:
marvel_df = rate_df[rate_df['Company'] == "Marvel"]
mean_marvel = marvel_df['Rate'].mean()
std_marvel = marvel_df['Rate'].std()
n_marvel = marvel_df['Rate'].count()
dc_df = rate_df[rate_df['Company'] == "DC"]
mean_dc = dc_df['Rate'].mean()
std_dc = dc_df['Rate'].std()
n_dc = dc_df['Rate'].count()
CodePudding user response:
marvel_df[['Rate']].count()
gives you a pd.Series
with one element, namely the count of the non-NaN
values in the Rate
column of marvel_df
. n_marvel == 23
returns a pd.Series
of boolean values, again with only one entry - but the truth value of a Series is ambiguous.