Home > Enterprise >  Problem finding number of elements in a dataframe in R
Problem finding number of elements in a dataframe in R

Time:10-15

I have downloaded the data frame casos_hosp_uci_def_sexo_edad_provres_60_mas.csv, which describes the amount of people infected from Covid-19 in Spain classified for their province, age, gender... from this webpage. I read and represent the dataframe as:

db<-read.csv(file = 'casos_hosp_uci_def_sexo_edad_provres.csv')

The first five rows are shown

      provincia_iso sexo grupo_edad      fecha num_casos num_hosp num_uci num_def

1               A    H        0-9 2020-01-01         0        0       0       0
2               A    H      10-19 2020-01-01         0        0       0       0
3               A    H      20-29 2020-01-01         0        0       0       0
4               A    H      30-39 2020-01-01         0        0       0       0
5               A    H      40-49 2020-01-01         0        0       0       0

The first four colums of the data frame show the name of the province, gender of the people, age group and date, the latest four columns show the number of people who got ill, were hospitalized, in ICU or dead.

I want to use R to find the day with the highest rate of contagions. To do that, I have to sum the elements of the fifth row num_casos for each different value of the column fecha.

I have already been able to calculate the number of sick males as hombresEnfermos=sum(db[which(db$sexo=="H"), 5]). However, I think there has to be a better way to check the days with higher contagion than go manually counting. However, I cannot find out how.

Can someone please help me?

CodePudding user response:

Using dplyr to get the total by day:

library(dplyr)  

db %>% 
      group_by(fecha) %>% 
      summarise(total = sum(num_casos)) %>%
      arrange(-total)

Alternatively, In base R

data.frame(fecha = sort(unique(db$fecha)), 
       total = sapply(split(db, f = db$fecha), function(x) {sum(x[['num_casos']])}))
  • Related