I have a data frame that may or may not contain NAs. If NAs are present, they are present in identical rows. My included code for columns when NAs are not present works as desired for calculation of summary statistics. Presence of NAs produce NAs for the summary stats.
I have have a feeling this solution will somehow figure into my solution , though my attempts so far are not successful. One of my columns is of character format ('a' in this example) and requires bypassing with is.numeric.
R: summarise a dataframe with NAN in columns summarise(across(.fns = na.omit))
fimber <- tibble(a = c("1", "2", "3", "4", "5"),
b = c(8, 9, 10, NA, NA),
c = c(10, 15, 20, NA, NA),
d = c(50, 60, 70, NA, NA),
e = c(80, 90, 100, NA, NA)
)
fimber
# A tibble: 5 × 5
# a b c d e
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 8 10 50 80
# 2 2 9 15 60 90
# 3 3 10 20 70 100
# 4 4 NA NA NA NA
# 5 5 NA NA NA NA
# Works fine with no NAs
fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd )), 2) )
fimber %>% add_row(a = "Median", round( summarise( across(.fns = na.omit, where(is.numeric)), median ), 2) )
CodePudding user response:
By using the symbol ~
, in .fns argument, you can customize your desired function:
fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), .~min(.x, na.rm = T) )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), .~max(.x, na.rm=T) )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), ~mean(.x, na.rm=T) )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), ~median(.x, na.rm=T) )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), ~sd(.x, na.rm=T) )), 2) )
These commands lead me to the following output:
# A tibble: 6 x 5
a b c d e
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 8 10 50 80
2 2 9 15 60 90
3 3 10 20 70 100
4 4 NA NA NA NA
5 5 NA NA NA NA
6 Min 8 10 50 80
# A tibble: 6 x 5
a b c d e
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 8 10 50 80
2 2 9 15 60 90
3 3 10 20 70 100
4 4 NA NA NA NA
5 5 NA NA NA NA
6 Max 10 20 70 100
# A tibble: 6 x 5
a b c d e
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 8 10 50 80
2 2 9 15 60 90
3 3 10 20 70 100
4 4 NA NA NA NA
5 5 NA NA NA NA
6 Mean 9 15 60 90
# A tibble: 6 x 5
a b c d e
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 8 10 50 80
2 2 9 15 60 90
3 3 10 20 70 100
4 4 NA NA NA NA
5 5 NA NA NA NA
6 Median 9 15 60 90
# A tibble: 6 x 5
a b c d e
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 8 10 50 80
2 2 9 15 60 90
3 3 10 20 70 100
4 4 NA NA NA NA
5 5 NA NA NA NA
6 St. Dev. 1 5 10 10
CodePudding user response:
Try this
fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd , na.rm = TRUE)), 2) )