I'm working with a Covid19 dataset in which each row contains Covid data (tests, positives, negatives, deaths, etc...) for a particular day in a particular country. This means there are multiple rows for each day since every country's data gets its own row. I'm trying to generalize the data to only include one row per day per continent. Is there a simple way to sum all columns where the date is the same?
For example, I'd want to go from a table like this...
Date | Continent | Country | Positives |
---|---|---|---|
2020-02-05 | Europe | United Kingdom | 10 |
2020-02-05 | Europe | Poland | 5 |
2020-02-05 | Europe | Sweden | 0 |
2020-02-06 | Europe | United Kingdom | 12 |
2020-02-05 | Europe | Poland | 7 |
2020-02-05 | Europe | Sweden | 1 |
to one like this...
Date | Continent | Positives |
---|---|---|
2020-02-05 | Europe | 15 |
2020-02-06 | Europe | 20 |
The closest I've gotten is
covid19EU <- covid19 %>% filter(Continent_Name == "Europe") %>% group_by(Date) %>% summarise_all(max)
but this returns the highest value instead of summing the value over all observations for the same date, ie
Date | Continent | Country | Positives |
---|---|---|---|
2020-02-05 | Europe | United Kingdom | 10 |
2020-02-06 | Europe | United Kingdom | 12 |
CodePudding user response:
covid19 %>%
group_by(Date, Continent) %>%
summarize(across(where(is.numeric), sum))
Result
Date Continent Positives Something_else
<chr> <chr> <int> <int>
1 2020-02-05 Europe 15 6
2 2020-02-06 Europe 20 15
If there's only one column of data to sum, you can use the count(wt = ...)
shortcut:
covid19 %>%
count(Date, Continent, wt = Positives, name = "Positives")
# Date Continent Positives
#1 2020-02-05 Europe 15
#2 2020-02-06 Europe 20
Example data with another column of numeric data to sum (possibly implied by OP question "Is there a simple way to sum all columns where the date is the same?")
covid19 <- data.frame(
stringsAsFactors = FALSE,
Date = c("2020-02-05","2020-02-05", "2020-02-05",
"2020-02-06","2020-02-06","2020-02-06"),
# NOTE UPDATED DATES VS OP
Continent = c("Europe","Europe","Europe",
"Europe","Europe","Europe"),
Country = c("United Kingdom","Poland",
"Sweden","United Kingdom","Poland","Sweden"),
Positives = c(10L, 5L, 0L, 12L, 7L, 1L),
Something_else = 1:6
)
CodePudding user response:
in base R:
aggregate(Positives~Date Continent,covid19, sum)
Date Continent Positives
1 2020-02-05 Europe 15
2 2020-02-06 Europe 20