I have a data frame like df
:
id <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "E")
year <- c("2005", "2006", "2007", "2008", "2005", "2006", "2007", "2005", "2007", "2006", "2007", "2008")
value <- 1:12
df <- data.frame(id, year, value)
I want to convert df
into a matrix id_observed
where columns count how many id's are observed for the first time, by year. Rows count how many ids "survive" to the consecutive year:
id_observed <- matrix(c(3,2,3,1,0,1,1,0,0,0,0,0,0,0,0,1), nrow = 4, ncol = 4)
#First observed id's (by columns), consecutive id's observations (by rows)
colnames(id_observed) <- c("2005", "2006", "2007", "2008")
rownames(id_observed) <- c("2005", "2006", "2007", "2008")
id_observed
The same idea applies to generate matrix value_observed
taking the information from value
. Where columns count the aggregated value of id's that are observed for the first time, by year. Rows count the aggregated value of the ids that "survived" to the consecutive year:
value_observed <- matrix(c(14,8,19,4,0,10,11,0,0,0,0,0,0,0,0,12), nrow = 4, ncol = 4)
#First observed value (by columns), consecutive value's observations (by rows)
colnames(value_observed) <- c("2005", "2006", "2007", "2008")
rownames(value_observed) <- c("2005", "2006", "2007", "2008")
value_observed
Any clue on how to build matrices id_observed
, and value_observed
in an automatic way?
CodePudding user response:
You can create this function, get_matrix()
, which leverages tidyverse approach to loop over unique years, creating the data for each year, binding the rows, and then pivoting wider
library(tidyverse)
get_matrix <- function(df, type=c("value","id")) {
res = lapply(unique(df$year), function(y) {
d = df %>% group_by(id) %>% filter(min(year)==y) %>% group_by(year)
if(type == "value") d = summarize(d,n=sum(value))
else d = summarize(d,n=n())
d = mutate(d,y=y)
if(nrow(d)==0) return(tibble(year=y, n=0, y=y)) else return(d)
})
bind_rows(res) %>%
pivot_wider(id_cols = year,names_from = y,values_from = n,values_fill = 0)
}
Usage
get_matrix(df, type="value")
Output
year `2005` `2006` `2007` `2008`
<chr> <dbl> <dbl> <dbl> <dbl>
1 2005 14 0 0 0
2 2006 8 10 0 0
3 2007 19 11 0 0
4 2008 4 0 0 12
Usage
get_matrix(df, type="id")
Output
year `2005` `2006` `2007` `2008`
<chr> <dbl> <dbl> <dbl> <dbl>
1 2005 3 0 0 0
2 2006 2 1 0 0
3 2007 3 1 0 0
4 2008 1 0 0 1