I am trying to find the median of the weight
column in this sample csv file using R
. But the code returns nothing. Where is the problem?
diabets <- read.csv ("https://hbiostat.org/data/repo/diabetes.csv")
median (diabets$weight)
And then after finding the median, I need to print the females whose weights are lower than this median. How can I do that?
Please NO extra libraries.
CodePudding user response:
na.rm = TRUE
argument will find median ignoring NA
's
There is one NA
in weights.
sum(is.na(diabetes$weight))
[1] 1
And median(diabetes$weight, na.rm = TRUE)
returns 172.5 so,
diabetes[diabetes$gender== "female" & diabetes$weight < 172.5, ]
will print females whose weights are lower than this median.
add
med <- median(diabetes$weight, na.rm = TRUE)
diabetes[(diabetes$gender== "female" & diabetes$weight < med), ]
or
diabetes[(diabetes$gender== "female" & diabetes$weight < median(diabetes$weight, na.rm = TRUE)), ]
CodePudding user response:
library(dplyr)
diabets %>%
filter(gender == "female") %>%
filter(weight < median(weight, na.rm = TRUE))
# A tibble: 123 x 19
id chol stab.glu hdl ratio glyhb location age gender height weight frame bp.1s bp.1d
<int> <int> <int> <int> <dbl> <dbl> <chr> <int> <chr> <int> <int> <chr> <int> <int>
1 1000 203 82 56 3.60 4.31 Buckingh~ 46 female 62 121 medi~ 118 59
2 1024 242 82 54 4.5 4.77 Louisa 60 female 65 156 medi~ 130 90
3 1030 238 75 36 6.60 4.47 Louisa 27 female 60 170 medi~ 130 80
4 1031 183 79 46 4 4.59 Louisa 40 female 59 165 medi~ NA NA
5 1036 213 83 47 4.5 3.41 Louisa 33 female 65 157 medi~ 130 90
6 1271 228 66 45 5.10 4.61 Buckingh~ 24 female 61 113 medi~ 100 70
7 1277 179 80 92 1.90 4.18 Buckingh~ 41 female 72 118 small 144 112
8 1282 254 84 52 4.90 4.52 Buckingh~ 43 female 62 145 medi~ 125 70
9 1317 136 81 51 2.70 4.58 Buckingh~ 22 female 66 160 large 105 85
10 1321 218 68 46 4.70 3.89 Buckingh~ 52 female 62 170 medi~ 142 79
# ... with 113 more rows, and 5 more variables: bp.2s <int>, bp.2d <int>, waist <int>,
# hip <int>, time.ppn <int>