I have a dataset filled with Medicare beneficiaries. The question is: 'What proportion of patients have at least one of the chronic conditions described in the independent variables alzheimers, arthritis, cancer, copd, depression, diabetes, heart.failure, ihd, kidney, osteoporosis, and stroke?'
I tried creating a subset and using isnull() & any(), but i can´t get a proper solution.. also tried df.loc but it only lets me name one column..
i am attaching the dataset for better understanding!
https://drive.google.com/file/d/1R--YEsBCDHMXjqNzAumT2zzUAYvM1bWA/view?usp=sharing
Thanks!!
My Try´s:
claimss.loc[:, ["alzheimers","diabetes","arthritis"] == 1]
(Wanted to try it first with 3 columns, doesn´t work in the first place..)
Try with Subset:
filtered_df = df.loc[raw_df] == 1]
(Created a Subset where only index and independet variables(diseases) appear and tried to look for null-rows)
CodePudding user response:
If need filter only some columns names use subset for filter columns names, compare by 1
by DataFrame.eq
and last test at least one True
by DataFrame.any
:
claimss[claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1)]
If need percentage use mean
with boolean mask:
out = claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1).mean()