I have a list of named vectors. I am trying to sum their values. But some of the names within a vector have reversed equivalents. For example, if I have some data that looks like this:
myList <- list(`1` = c('x1:x2' = 2, 'x2:x1' = 1, 'x3:x4' = 1),
`2` = c('x1:x2' = 3, 'x6:x1' = 2, 'x1:x1' = 1, 'x4:x3' = 1),
`3` = c('x3:x4' = 2, 'x1:x2' = 1, 'x4:x3' = 4),
`4` = c('x5:x2' = 1, 'x2:x5' = 1)
)
> myList
$`1`
x1:x2 x2:x1 x3:x4
2 1 1
$`2`
x1:x2 x6:x1 x1:x1 x4:x3
3 2 1 1
$`3`
x3:x4 x1:x2 x4:x3
2 1 4
$`4`
x5:x2 x2:x5
1 1
Here, we can see that in myList[[1]]
we have x1:x2 = 2
and x2:x1 = 1
. As these are the reverse of each other, they are equivalent, so, essentially, x1:x2 = 3
.
I am trying to sum the values for each named element (including the reverse) over each list element.
My desired output would look something like this:
var count listNo
1 x1:x2 3 1
2 x3:x4 1 1
3 x1:x2 3 2
4 x6:x1 2 2
5 x1:x1 1 2
6 x4:x3 1 2
7 x3:x4 6 3
8 x1:x2 1 3
9 x5:x2 2 4
CodePudding user response:
This is tricky. I'd be interested to see a more elegant solution
`row.names<-`(do.call(rbind, Map(function(vec, name) {
x <- names(vec)
l <- sapply(strsplit(x, ":"), function(y) {
paste0("x", sort(as.numeric(sub("\\D", "", y))), collapse = ":")
})
df <- setNames(as.data.frame(table(rep(l, vec))), c("var", "count"))
df$listNo <- name
df
}, vec = myList, name = names(myList))), NULL)
#> var count listNo
#> 1 x1:x2 3 1
#> 2 x3:x4 1 1
#> 3 x1:x1 1 2
#> 4 x1:x2 3 2
#> 5 x1:x6 2 2
#> 6 x3:x4 1 2
#> 7 x1:x2 1 3
#> 8 x3:x4 6 3
#> 9 x2:x5 2 4
Created on 2022-03-06 by the reprex package (v2.0.1)
CodePudding user response:
Another tidyverse
option could be:
map_dfr(myList, enframe, .id = "listNo") %>%
mutate(var = map_chr(str_split(name, ":"), ~ str_c(sort(.), collapse = ":"))) %>%
group_by(listNo, var) %>%
summarise(count = sum(value))
listNo var count
<chr> <chr> <dbl>
1 1 x1:x2 3
2 1 x3:x4 1
3 2 x1:x1 1
4 2 x1:x2 3
5 2 x1:x6 2
6 2 x3:x4 1
7 3 x1:x2 1
8 3 x3:x4 6
9 4 x2:x5 2
CodePudding user response:
Here is a base R approach:
UPDATE:
My original answer uses a for loop to wrap the whole thing, now I've replaced it with lapply
, it's much faster.
tmp <- lapply(1:length(myList), function(i) {
tapply(setNames(myList[[i]],
sapply(strsplit(names(myList[[i]]), ":"),
function(x) paste0(sort(x), collapse = ":"))),
sapply(strsplit(names(myList[[i]]), ":"),
function(x) paste0(sort(x), collapse = ":")), sum)
})
bind_rows(tmp, .id = "listNo") |>
pivot_longer(!listNo, names_to = "var", values_to = "count", values_drop_na = T)
# A tibble: 9 x 3
listNo var count
<chr> <chr> <dbl>
1 1 x1:x2 3
2 1 x3:x4 1
3 2 x1:x2 3
4 2 x3:x4 1
5 2 x1:x1 1
6 2 x1:x6 2
7 3 x1:x2 1
8 3 x3:x4 6
9 4 x2:x5 2
Out of curiosity, I've run microbenchmark
on the five existing answers (mine, @AllanCameron, @PaulS, @zephryl and @tmfmnk), seems like @AllanCameron's solution is the best so far:
Unit: milliseconds
expr min lq mean median uq max neval cld
benson23 3.385100 3.598351 3.844276 3.723601 3.908551 12.3250 100 a
Allan 2.040701 2.199551 2.347864 2.258351 2.329651 9.6858 100 a
PaulS 10.719501 11.388101 12.108290 11.910501 12.417801 19.7472 100 c
zephryl 10.080301 10.713800 12.694816 11.056551 11.489301 130.4519 100 c
tmfmnk 5.331602 5.633501 6.112947 5.965001 6.234252 13.5028 100 b
CodePudding user response:
A {tidyverse} solution:
library(tidyverse)
tibble(count = myList, listNo = names(myList)) %>%
unnest_longer(count, indices_to = "var") %>%
mutate(
var = str_extract_all(var, "\\d "),
var = map_chr(var, ~ str_glue("x{sort(.x)[[1]]}:x{sort(.x)[[2]]}"))
) %>%
group_by(listNo, var) %>%
summarize(count = sum(count), .groups = "drop")
# # A tibble: 9 x 3
# listNo var count
# <chr> <chr> <dbl>
# 1 1 x1:x2 3
# 2 1 x3:x4 1
# 3 2 x1:x1 1
# 4 2 x1:x2 3
# 5 2 x1:x6 2
# 6 2 x3:x4 1
# 7 3 x1:x2 1
# 8 3 x3:x4 6
# 9 4 x2:x5 2
CodePudding user response:
Another possible solution, tidyverse
-based:
library(tidyverse)
map_dfr(myList, identity, .id = "listNo") %>%
pivot_longer(cols = -listNo, values_drop_na = T) %>%
rowwise %>%
mutate(name = str_split(name, ":", simplify = T) %>% sort %>%
str_c(collapse = ":")) %>%
group_by(name, listNo) %>%
summarise(count = sum(value), .groups = "drop")
#> # A tibble: 9 × 3
#> name listNo count
#> <chr> <chr> <dbl>
#> 1 x1:x1 2 1
#> 2 x1:x2 1 3
#> 3 x1:x2 2 3
#> 4 x1:x2 3 1
#> 5 x1:x6 2 2
#> 6 x2:x5 4 2
#> 7 x3:x4 1 1
#> 8 x3:x4 2 1
#> 9 x3:x4 3 6