Home > Software design >  How can I summarise rows and mean for a specific column
How can I summarise rows and mean for a specific column

Time:11-23

How can I sum the rows of my dataframe being grouped by a specific column, I already tried with group_by()rowSums() but it gives me an error in wanting to group this column in the specific one that I want. The purpose of this is that I want to show the percentage by Localidad of all other columns. Using dput(mydataframe) it gives me this:


 No.    Mes             Delegacion              Localidad Esquema Número
1    1 oct-10     09CIUDAD DE MÉXICO    09016MIGUEL HIDALGO       U    629
2    2 oct-10     09CIUDAD DE MÉXICO 09005GUSTAVO A. MADERO       U   1402
3    3 oct-10     09CIUDAD DE MÉXICO 09005GUSTAVO A. MADERO       U    699
4    4 oct-10     09CIUDAD DE MÉXICO          09003COYOACÁN       U     48
5    5 oct-10     09CIUDAD DE MÉXICO    09010ÁLVARO OBREGÓN       U    539
6    6 oct-10     09CIUDAD DE MÉXICO           09011TLÁHUAC       U     55
7    7 oct-10     09CIUDAD DE MÉXICO           09011TLÁHUAC       U    700
8    8 oct-10     09CIUDAD DE MÉXICO        09007IZTAPALAPA       Y     23
9    9 oct-10       01AGUASCALIENTES    01001AGUASCALIENTES       U     10
10  10 oct-10       01AGUASCALIENTES    01001AGUASCALIENTES       U   1270
11  11 oct-10 02BAJA CALIFORNIA               02001ENSENADA       Y     41

                                                Nombre               X2.1 X2.2 X2.3 X2.4 X2.5 X2.6 X2.7 X2.8 X2.9
1                                              MUNDO DE LOS PEQUES    1    1    1    1    1    1    1    1    1
2                                   GUARDERIA EL ARBOL DE LA NIÑEZ    1    1    1    1    1    1    1    1    1
3                                           LOS PEQUEÑOS GENIOS II    1    1    1    1    1    0    1    1    1
4                                                   MI MUNDO FELIZ    1    1    1    1    1    1    1    1    1
5                       CENTRO ECOLÓGICO DE DESARROLLO INFANTIL II    1    1    1    1    1    1    1    1    1
6                                 ESTANCIA INFANTIL TERCER MILENIO    1    1    1    1    1    0    1    1    1
7                                                   CEDEI COKELUSH    1    1    1    1    1    0    1    1    1
8                    INSTITUTO PEDAGÓGICO DE EDUCACIÓN INCIAL S.A.    1    1    1    1    1    1    1    1    1
9               CTO. DE DESARROLLO INFANTIL GARABATILLOS, SA DE CV    1    1    1    1    1    1    1    1    1
10 CENTRO DE DESARROLLO INFANTIL EL MUNDO DE LOS NIÑOS Y LAS NIÑAS    1    1    1    1    1    1    1    1    1
11                                              CARITAS SONRIENTES    1    1    1    1    1    0    1    1    1


I want to take the average of all the rows grouping by Localidad

CodePudding user response:

If you want to get average of every each variables, by using dplyr,

library(dplyr)
df %>%
  group_by(Localidad) %>%
  summarise(across(where(is.numeric), ~mean(.x)))

will helps.

Add

result <- dummy %>%
  group_by(Localidad) %>%
  select(Localidad, starts_with("X")) %>%
  summarise(across(where(is.numeric), ~ scales::percent(mean(.x))))

  Localidad   X2.1  X2.2  X2.3  X2.4  X2.5  X2.6  X2.7  X2.8  X2.9  X2.1.1 X2.11 X2.12 X3.1  X3.2  X5.1  X5.2  X5.3  X5.4  X5.5  X5.6  X5.7  X5.8  X6.1 
   <chr>       <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 "01001AGUA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 2 "02001ENSE~ 100%  100%  100%  100%  100%  0%    100%  100%  100%  100%   100%  100%  100%  100%  0%    0%    100%  100%  100%  100%  100%  100%  100% 
 3 "02002MEXI~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 4 "02004TIJU~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  75%   100%  100%  100%  75%   100%  100%  100% 
 5 "03003LA P~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 6 "04003CIUD~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  0%    100%  100%  100%  100%  100%  100%  100%  100% 
 7 "09003COYO~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 8 "09005GUST~ 100%  100%  100%  100%  100%  50%   100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 9 "09007IZTA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
10 "09010ALVA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
11 "09011TLAH~ 100%  100%  100%  100%  100%  0%    100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
12 "09016MIGU~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
13 "26055SAN ~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
# ... with 49 more variables: X6.2 <chr>, X6.3 <chr>, X6.4 <chr>, X6.5 <chr>, X7.1 <chr>, X7.2 <chr>, X7.3 <chr>, X7.4 <chr>, X8.1 <chr>, X8.2 <chr>,
#   X9.1 <chr>, X9.2 <chr>, X9.3 <chr>, X9.4 <chr>, X10.1 <chr>, X10.2 <chr>, X10.3 <chr>, X10.4 <chr>, X10.5 <chr>, X10.6 <chr>, X10.7 <chr>,
#   X10.8 <chr>, X10.9 <chr>, X11.1 <chr>, X11.2 <chr>, X11.3 <chr>, X11.4 <chr>, X11.5 <chr>, X11.6 <chr>, X11.7 <chr>, X11.8 <chr>, X11.9 <chr>,
#   X11.1.1 <chr>, X11.11 <chr>, X11.12 <chr>, X11.13 <chr>, X11.14 <chr>, X11.15 <chr>, X11.16 <chr>, X12.1 <chr>, X12.2 <chr>, X12.3 <chr>,
#   X12.4 <chr>, X12.5 <chr>, X12.6 <chr>, X12.8 <chr>, X14.1 <chr>, X14.2 <chr>, X. <chr>

CodePudding user response:

We can use rowwise and calculate the mean with the columns that match the pattern starting with and X followed by a number.

library(tidyverse)


# Data --------------------------------------------------------------------


df <- 
read_table("
                                                           Nombre               X2.1 X2.2 X2.3 X2.4 X2.5 X2.6 X2.7 X2.8 X2.9
                                              MUNDO-DE-LOS-PEQUES    1    1    1    1    1    1    1    1    1
                                   GUARDERIA-EL-ARBOL-DE-LA-NIÑEZ    1    1    1    1    1    1    1    1    1
                                           LOS-PEQUEÑOS-GENIOS-II    1    1    1    1    1    0    1    1    1
                                                   MI-MUNDO-FELIZ    1    1    1    1    1    1    1    1    1
                       CENTRO-ECOLÓGICO-DE-DESARROLLO-INFANTIL-II    1    1    1    1    1    1    1    1    1
                                 ESTANCIA-INFANTIL-TERCER-MILENIO    1    1    1    1    1    0    1    1    1
                                                   CEDEI-COKELUSH    1    1    1    1    1    0    1    1    1
                    INSTITUTO-PEDAGÓGICO-DE-EDUCACIÓN-INCIAL-S.A.    1    1    1    1    1    1    1    1    1
               CTO.-DE-DESARROLLO-INFANTIL-GARABATILLOS,-SA-DE-CV    1    1    1    1    1    1    1    1    1
 CENTRO-DE-DESARROLLO-INFANTIL-EL-MUNDO-DE-LOS-NIÑOS-Y-AS-NIÑAS    1    1    1    1    1    1    1    1    1
                                              CARITAS-SONRIENTES    1    1    1    1    1    0    1    1    1")

df <- 
df %>% 
  mutate(Nombre = str_replace_all(Nombre, '-', ' '))


# Code --------------------------------------------------------------------


df %>% 
  rowwise() %>% 
  mutate(Pecentage = mean(c_across(matches('^X\\d')))) 
#> # A tibble: 11 × 11
#> # Rowwise: 
#>    Nombre         X2.1  X2.2  X2.3  X2.4  X2.5  X2.6  X2.7  X2.8  X2.9 Pecentage
#>    <chr>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
#>  1 MUNDO DE LOS…     1     1     1     1     1     1     1     1     1     1    
#>  2 GUARDERIA EL…     1     1     1     1     1     1     1     1     1     1    
#>  3 LOS PEQUEÑOS…     1     1     1     1     1     0     1     1     1     0.889
#>  4 MI MUNDO FEL…     1     1     1     1     1     1     1     1     1     1    
#>  5 CENTRO ECOLÓ…     1     1     1     1     1     1     1     1     1     1    
#>  6 ESTANCIA INF…     1     1     1     1     1     0     1     1     1     0.889
#>  7 CEDEI COKELU…     1     1     1     1     1     0     1     1     1     0.889
#>  8 INSTITUTO PE…     1     1     1     1     1     1     1     1     1     1    
#>  9 CTO. DE DESA…     1     1     1     1     1     1     1     1     1     1    
#> 10 CENTRO DE DE…     1     1     1     1     1     1     1     1     1     1    
#> 11 CARITAS SONR…     1     1     1     1     1     0     1     1     1     0.889

Created on 2021-11-22 by the reprex package (v2.0.1)

  • Related