I have a dataset that looks like this :
var | date | value |
---|---|---|
A | 2022-01-01 | 1 |
A | 2022-01-02 | 2 |
A | 2022-01-03 | 3 |
A | 2022-01-04 | 4 |
A | 2022-01-05 | 5 |
A | 2022-01-06 | 6 |
A | 2022-01-07 | 7 |
B | 2022-02-02 | 10 |
B | 2022-01-03 | 20 |
B | 2022-01-07 | 30 |
C | 2022-01-01 | 100 |
C | 2022-01-04 | 200 |
C | 2022-01-05 | 300 |
C | 2022-06-06 | 400 |
My variable of interest is the A from column var.Specifically the dates of A that match the values of the other factors inn var variable.I want to pivot wider them into :
date | A | B | C |
---|---|---|---|
2022-01-01 | 1 | NA | 100 |
2022-01-02 | 2 | NA | NA |
2022-01-03 | 3 | 20 | NA |
2022-01-04 | 4 | NA | 200 |
2022-01-05 | 5 | NA | 300 |
2022-01-06 | 6 | NA | NA |
2022-01-07 | 7 | 30 | NA |
And at the end to each column to summarize the sum (or even the correlation of A with B and the correlation A with C):
var | Sum |
---|---|
A | 28 |
B | 50 |
C | 600 |
How can I do it in R using dplyr package ?
library(tidyverse)
date = c(seq(as.Date("2022/1/1"), by = "day", length.out = 7),
as.Date("2022/2/2"),as.Date("2022/1/3"),as.Date("2022/1/7"),
as.Date("2022/1/1"),as.Date("2022/1/4"),as.Date("2022/1/5"),as.Date("2022/6/6"))
var = c(rep("A",7),rep("B",3),rep("C",4))
value = c(seq(1,7,1),10,20,30,100,200,300,400)
data = tibble(var,date,value);data
CodePudding user response:
We may convert to 'wide' format with pivot_wider
, filter
out the NA element rows from A
column,
summarise
acrossthe columns to get the
sumand reshape to 'long' with
pivot_longer`
library(dplyr)
library(tidyr)
pivot_wider(data, names_from = var, values_from = value) %>%
filter(!is.na(A)) %>%
summarise(across(A:C, sum, na.rm = TRUE)) %>%
pivot_longer(cols = everything(), names_to = 'var', values_to = 'Sum')
# A tibble: 3 × 2
var Sum
<chr> <dbl>
1 A 28
2 B 50
3 C 600