Home > Back-end >  Can't calculate avarage properly in R
Can't calculate avarage properly in R

Time:06-03

I have this table and I want to calculate avarage of the all mean values in CHOR1, CHOR2 and KONTROLA:

      grupa           plcie    dane  wiek     hsCRP     ERY    PLT      HGB      HCT     MCHC       MON     LEU
1     CHOR1   liczba kobiet     min 17.00  0.487607  3.5300 128.00  9.50490 0.280000 32.55560 0.4800000  6.7900
2     CHOR1              14 mediana 29.00  3.966460  4.2000 217.00 12.40470 0.363000 35.04710 0.7600000 11.6600
3     CHOR1 liczba mezczyzn    mean 29.56  6.103022  5.3628 225.28 12.40173 0.363560 35.12882 0.8578811 12.0220
4     CHOR1              11     max 43.00 42.649900 33.0000 336.00 14.49900 0.405000 36.87420 1.5200000 16.8100
5     CHOR2   liczba kobiet     min 21.00  0.335089  3.2500  91.00  9.82710 0.042300 32.88780 0.1400000  7.9500
6     CHOR2              12 mediana 30.00  3.445460  4.2700 195.00 12.56580 0.360000 35.54540 0.6600000 12.0000
7     CHOR2 liczba mezczyzn    mean 30.04  5.536029  4.1976 209.12 12.80616 0.345972 35.55204 0.9528000 12.0376
8     CHOR2              13     max 42.00 19.212400  5.0400 456.00 22.23180 0.412000 38.86740 7.0000000 16.5900
9  KONTROLA   liczba kobiet     min 23.00  0.758440  3.0900 147.00  9.50490 0.279000 32.05730 0.3500000  4.8300
10 KONTROLA              14 mediana 32.00  4.220370  3.9800 214.00 11.43810 0.339000 34.54880 0.7600000 10.6800
11 KONTROLA liczba mezczyzn    mean 32.32  5.295149  4.0132 225.88 11.29980 0.337560 34.40263 0.7604000 11.3604
12 KONTROLA              11     max 48.00 14.395100  5.0500 434.00 13.21020 0.389000 36.04370 1.2500000 17.4600

I am using this function to calculate avarage of min, median, max, and mean (charakterystyka_numeric is the same table, but with only numerical columns):

for(j in 1:nr_kolumn){
  
    for(p in 1:liczba_grup){
      minimum_srednia <- minimum_srednia   strtoi(charakterystyka_numeric[(p ((p*3)-3)), j])
      mediana_srednia <- mediana_srednia   strtoi(charakterystyka_numeric[((p*2) ((p*2)-2)), j])
      mean_srednia <- mean_srednia   strtoi(charakterystyka_numeric[((p*3) (p-1)), j])
      print(charakterystyka_numeric[((p*3) (p-1)), j])
      print(mean_srednia)
      max_srednia <- max_srednia   strtoi(charakterystyka_numeric[(p*4), j])
      print(mean_srednia)
    }
    minimum_srednia <- minimum_srednia/liczba_grup
    mediana_srednia <- mediana_srednia/liczba_grup
    mean_srednia <- mean_srednia/liczba_grup
    max_srednia <- max_srednia/liczba_grup

Everything works properly except mean_srednia it returns NA as the result.

CodePudding user response:

As mentioned in my comment, something like

library(tidyverse)
d %>% 
  group_by(grupa) %>% 
  summarise(
    across(
      where(is.numeric), 
      list("mean"=mean, "min"=min, "max"=max, "median"=median), 
      na.rm=TRUE
    )
  )
# A tibble: 3 × 41
  grupa    m_mean m_min m_max m_median wiek_mean wiek_min wiek_max wiek_median hsCRP_mean hsCRP_min hsCRP_max hsCRP_median ERY_mean ERY_min ERY_max ERY_median PLT_mean PLT_min
  <chr>     <dbl> <int> <int>    <dbl>     <dbl>    <dbl>    <dbl>       <dbl>      <dbl>     <dbl>     <dbl>        <dbl>    <dbl>   <dbl>   <dbl>      <dbl>    <dbl>   <dbl>
1 CHOR1       2.5     1     4      2.5      29.6       17       43        29.3      13.3      0.488      42.6         5.03    11.5     3.53   33          4.78     227.     128
2 CHOR2       6.5     5     8      6.5      30.8       21       42        30.0       7.13     0.335      19.2         4.49     4.19    3.25    5.04       4.23     238.      91
3 KONTROLA   10.5     9    12     10.5      33.8       23       48        32.2       6.17     0.758      14.4         4.76     4.03    3.09    5.05       4.00     255.     147
# … with 22 more variables: PLT_max <dbl>, PLT_median <dbl>, HGB_mean <dbl>, HGB_min <dbl>, HGB_max <dbl>, HGB_median <dbl>, HCT_mean <dbl>, HCT_min <dbl>, HCT_max <dbl>,
#   HCT_median <dbl>, MCHC_mean <dbl>, MCHC_min <dbl>, MCHC_max <dbl>, MCHC_median <dbl>, MON_mean <dbl>, MON_min <dbl>, MON_max <dbl>, MON_median <dbl>, LEU_mean <dbl>,
#   LEU_min <dbl>, LEU_max <dbl>, LEU_median <dbl>

Should give you close to what you want.

I suspect your problem with mean_srednia is due to missing values, but since you haven't shown us that column, I can't be sure. If I am correct, the na.rm=FALSE in my solution should solve the problem.

CodePudding user response:

For a short base R approach, you might try aggregate() to calc. the means:

# data 
df <- data.frame(grupa = as.factor(c(rep("C", 5), rep("K", 5))),
                 V1 = rnorm(10),
                 V2 = rnorm(10))
# approach
aggregate(.~grupa, data = df, mean)
#>   grupa         V1         V2
#> 1     C -0.1779009 -0.1893905
#> 2     K  0.3261067  0.2520248

Replace mean by min etc. Note, the toy data I am using has only two levels. The approach, of course, works for several levels of variable grupa.

Created on 2022-06-03 by the reprex package (v2.0.1)

  • Related