Home > Software design >  Summary statistics needed when NAs are present in a data frame's common rows
Summary statistics needed when NAs are present in a data frame's common rows

Time:06-16

I have a data frame that may or may not contain NAs. If NAs are present, they are present in identical rows. My included code for columns when NAs are not present works as desired for calculation of summary statistics. Presence of NAs produce NAs for the summary stats.

I have have a feeling this solution will somehow figure into my solution , though my attempts so far are not successful. One of my columns is of character format ('a' in this example) and requires bypassing with is.numeric.

R: summarise a dataframe with NAN in columns summarise(across(.fns = na.omit))


fimber <- tibble(a = c("1", "2", "3", "4", "5"),
                 b = c(8, 9, 10, NA, NA),
                 c = c(10, 15, 20, NA, NA),
                 d = c(50, 60, 70, NA, NA),
                 e = c(80, 90, 100, NA, NA)
)

fimber

# A tibble: 5 × 5
#   a         b     c     d     e
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1         8    10    50    80
# 2 2         9    15    60    90
# 3 3        10    20    70   100
# 4 4        NA    NA    NA    NA
# 5 5        NA    NA    NA    NA



# Works fine with no NAs

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min  )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max  )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean  )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median  )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd  )), 2) )


fimber %>% add_row(a = "Median", round( summarise( across(.fns = na.omit, where(is.numeric)), median  ), 2) )




CodePudding user response:

By using the symbol ~, in .fns argument, you can customize your desired function:

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), .~min(.x, na.rm = T)  )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), .~max(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), ~mean(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), ~median(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), ~sd(.x, na.rm=T)  )), 2) )

These commands lead me to the following output:

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Min       8    10    50    80

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Max      10    20    70   100

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Mean      9    15    60    90

# A tibble: 6 x 5
  a          b     c     d     e
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          8    10    50    80
2 2          9    15    60    90
3 3         10    20    70   100
4 4         NA    NA    NA    NA
5 5         NA    NA    NA    NA
6 Median     9    15    60    90

# A tibble: 6 x 5
  a            b     c     d     e
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 1            8    10    50    80
2 2            9    15    60    90
3 3           10    20    70   100
4 4           NA    NA    NA    NA
5 5           NA    NA    NA    NA
6 St. Dev.     1     5    10    10

CodePudding user response:

Try this

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd  , na.rm = TRUE)), 2) )

  • Related