I am trying to find the median of the weight column in this sample csv file using R. But the code returns nothing. Where is the problem?

diabets <- read.csv ("https://hbiostat.org/data/repo/diabetes.csv")
median (diabets$weight)

And then after finding the median, I need to print the females whose weights are lower than this median. How can I do that?

Please NO extra libraries.

CodePudding user response：

na.rm = TRUE argument will find median ignoring NA's There is one NA in weights.

sum(is.na(diabetes$weight))
[1] 1

And median(diabetes$weight, na.rm = TRUE) returns 172.5 so,

diabetes[diabetes$gender== "female" & diabetes$weight < 172.5, ]

will print females whose weights are lower than this median.

add

med <- median(diabetes$weight, na.rm = TRUE)
diabetes[(diabetes$gender== "female" & diabetes$weight < med), ]

diabetes[(diabetes$gender== "female" & diabetes$weight < median(diabetes$weight, na.rm = TRUE)), ]

CodePudding user response：

library(dplyr)

diabets %>% 
  filter(gender == "female") %>% 
  filter(weight < median(weight, na.rm = TRUE))

# A tibble: 123 x 19
      id  chol stab.glu   hdl ratio glyhb location    age gender height weight frame bp.1s bp.1d
   <int> <int>    <int> <int> <dbl> <dbl> <chr>     <int> <chr>   <int>  <int> <chr> <int> <int>
 1  1000   203       82    56  3.60  4.31 Buckingh~    46 female     62    121 medi~   118    59
 2  1024   242       82    54  4.5   4.77 Louisa       60 female     65    156 medi~   130    90
 3  1030   238       75    36  6.60  4.47 Louisa       27 female     60    170 medi~   130    80
 4  1031   183       79    46  4     4.59 Louisa       40 female     59    165 medi~    NA    NA
 5  1036   213       83    47  4.5   3.41 Louisa       33 female     65    157 medi~   130    90
 6  1271   228       66    45  5.10  4.61 Buckingh~    24 female     61    113 medi~   100    70
 7  1277   179       80    92  1.90  4.18 Buckingh~    41 female     72    118 small   144   112
 8  1282   254       84    52  4.90  4.52 Buckingh~    43 female     62    145 medi~   125    70
 9  1317   136       81    51  2.70  4.58 Buckingh~    22 female     66    160 large   105    85
10  1321   218       68    46  4.70  3.89 Buckingh~    52 female     62    170 medi~   142    79
# ... with 113 more rows, and 5 more variables: bp.2s <int>, bp.2d <int>, waist <int>,
#   hip <int>, time.ppn <int>