Home > Mobile >  Summarize temporal information into a dynamic matrix
Summarize temporal information into a dynamic matrix

Time:05-04

I have a data frame like df:

id <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "E")
year <- c("2005", "2006", "2007", "2008", "2005", "2006", "2007", "2005", "2007", "2006", "2007", "2008")
value <- 1:12
df <- data.frame(id, year, value)

I want to convert df into a matrix id_observed where columns count how many id's are observed for the first time, by year. Rows count how many ids "survive" to the consecutive year:

id_observed <- matrix(c(3,2,3,1,0,1,1,0,0,0,0,0,0,0,0,1), nrow = 4, ncol = 4)
#First observed id's (by columns), consecutive id's observations (by rows)
colnames(id_observed) <- c("2005", "2006", "2007", "2008")
rownames(id_observed) <- c("2005", "2006", "2007", "2008")
id_observed

The same idea applies to generate matrix value_observed taking the information from value. Where columns count the aggregated value of id's that are observed for the first time, by year. Rows count the aggregated value of the ids that "survived" to the consecutive year:

value_observed <- matrix(c(14,8,19,4,0,10,11,0,0,0,0,0,0,0,0,12), nrow = 4, ncol = 4)
#First observed value (by columns), consecutive value's observations (by rows)
colnames(value_observed) <- c("2005", "2006", "2007", "2008")
rownames(value_observed) <- c("2005", "2006", "2007", "2008")
value_observed

Any clue on how to build matrices id_observed, and value_observed in an automatic way?

CodePudding user response:

You can create this function, get_matrix(), which leverages tidyverse approach to loop over unique years, creating the data for each year, binding the rows, and then pivoting wider

library(tidyverse)

get_matrix <- function(df, type=c("value","id")) {
  res = lapply(unique(df$year), function(y) {
    d = df %>% group_by(id) %>% filter(min(year)==y) %>% group_by(year)
    if(type == "value") d = summarize(d,n=sum(value))
    else d = summarize(d,n=n())
    d = mutate(d,y=y)
    if(nrow(d)==0) return(tibble(year=y, n=0, y=y)) else return(d)
  })
  bind_rows(res) %>% 
    pivot_wider(id_cols = year,names_from = y,values_from = n,values_fill = 0)    
}

Usage

get_matrix(df, type="value")

Output

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005      14      0      0      0
2 2006       8     10      0      0
3 2007      19     11      0      0
4 2008       4      0      0     12

Usage

get_matrix(df, type="id")

Output

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005       3      0      0      0
2 2006       2      1      0      0
3 2007       3      1      0      0
4 2008       1      0      0      1
  • Related