Home > Mobile >  How can I match the values of multiple variables according to the dates of a variable of interest an
How can I match the values of multiple variables according to the dates of a variable of interest an

Time:09-13

I have a dataset that looks like this :

var date value
A 2022-01-01 1
A 2022-01-02 2
A 2022-01-03 3
A 2022-01-04 4
A 2022-01-05 5
A 2022-01-06 6
A 2022-01-07 7
B 2022-02-02 10
B 2022-01-03 20
B 2022-01-07 30
C 2022-01-01 100
C 2022-01-04 200
C 2022-01-05 300
C 2022-06-06 400

My variable of interest is the A from column var.Specifically the dates of A that match the values of the other factors inn var variable.I want to pivot wider them into :

date A B C
2022-01-01 1 NA 100
2022-01-02 2 NA NA
2022-01-03 3 20 NA
2022-01-04 4 NA 200
2022-01-05 5 NA 300
2022-01-06 6 NA NA
2022-01-07 7 30 NA

And at the end to each column to summarize the sum (or even the correlation of A with B and the correlation A with C):

var Sum
A 28
B 50
C 600

How can I do it in R using dplyr package ?

library(tidyverse)
date = c(seq(as.Date("2022/1/1"), by = "day", length.out = 7),
      as.Date("2022/2/2"),as.Date("2022/1/3"),as.Date("2022/1/7"),
      as.Date("2022/1/1"),as.Date("2022/1/4"),as.Date("2022/1/5"),as.Date("2022/6/6"))
var = c(rep("A",7),rep("B",3),rep("C",4))
value = c(seq(1,7,1),10,20,30,100,200,300,400)
data = tibble(var,date,value);data

CodePudding user response:

We may convert to 'wide' format with pivot_wider, filter out the NA element rows from A column, summarise acrossthe columns to get thesumand reshape to 'long' withpivot_longer`

library(dplyr)
library(tidyr)
pivot_wider(data, names_from = var, values_from = value) %>% 
   filter(!is.na(A)) %>% 
   summarise(across(A:C, sum, na.rm = TRUE)) %>% 
  pivot_longer(cols = everything(), names_to = 'var', values_to = 'Sum')
# A tibble: 3 × 2
  var     Sum
  <chr> <dbl>
1 A        28
2 B        50
3 C       600
  • Related