How can I match the values of multiple variables according to the dates of a variable of interest an-CodePudding

I have a dataset that looks like this :

var	date	value
A	2022-01-01	1
A	2022-01-02	2
A	2022-01-03	3
A	2022-01-04	4
A	2022-01-05	5
A	2022-01-06	6
A	2022-01-07	7
B	2022-02-02	10
B	2022-01-03	20
B	2022-01-07	30
C	2022-01-01	100
C	2022-01-04	200
C	2022-01-05	300
C	2022-06-06	400

My variable of interest is the A from column var.Specifically the dates of A that match the values of the other factors inn var variable.I want to pivot wider them into :

date	A	B	C
2022-01-01	1	NA	100
2022-01-02	2	NA	NA
2022-01-03	3	20	NA
2022-01-04	4	NA	200
2022-01-05	5	NA	300
2022-01-06	6	NA	NA
2022-01-07	7	30	NA

And at the end to each column to summarize the sum (or even the correlation of A with B and the correlation A with C):

var	Sum
A	28
B	50
C	600

How can I do it in R using dplyr package ?

library(tidyverse)
date = c(seq(as.Date("2022/1/1"), by = "day", length.out = 7),
      as.Date("2022/2/2"),as.Date("2022/1/3"),as.Date("2022/1/7"),
      as.Date("2022/1/1"),as.Date("2022/1/4"),as.Date("2022/1/5"),as.Date("2022/6/6"))
var = c(rep("A",7),rep("B",3),rep("C",4))
value = c(seq(1,7,1),10,20,30,100,200,300,400)
data = tibble(var,date,value);data

CodePudding user response：

We may convert to 'wide' format with pivot_wider, filter out the NA element rows from A column, summarise acrossthe columns to get thesumand reshape to 'long' withpivot_longer`

library(dplyr)
library(tidyr)
pivot_wider(data, names_from = var, values_from = value) %>% 
   filter(!is.na(A)) %>% 
   summarise(across(A:C, sum, na.rm = TRUE)) %>% 
  pivot_longer(cols = everything(), names_to = 'var', values_to = 'Sum')
# A tibble: 3 × 2
  var     Sum
  <chr> <dbl>
1 A        28
2 B        50
3 C       600