I have 40 cognitive maps and I want to use the list of variables from each map to create a an accumulation curve where I plot the map number of the x-axis and the number of "new" variables identified on the y-axis. i.e. For the first map, all variables would be “new” and then for the second map, only the ones not identified on map 1 would be “new” and for the 3rd map, only those variables not identified on either of the 1st two maps would be "new"... so on so forth cumulatively for each of the 40 maps.

My dataframe is in wide format, with map number as rownames (1-40) and variable name as column names (F1-F144), and then a value of 1 if the variable is present in that map and a 0 if absent.

Any ideas would be helpful.

CodePudding user response：

Here is a way.
which.max returns the index to the first maximum of a numeric vector. Since in each column all values prior to the first map where that variable occurs are 0, the first 1 is the first maximum. Then, coerce the maxima index vector to factor with complete levels from 1 to the number of maps/rows and table this factor. The table is a counts of new variables per map.

new_var <- apply(df1, 2, \(x) {
  i <- which.max(x)
  if(x[i] == 1) i else NA_integer_
})
new_var <- factor(new_var, labels = row.names(df1), levels = seq_len(m))
table(new_var)
#> new_var
#> map01 map02 map03 map04 map05 map06 map07 map08 map09 map10 
#>    10     2     4     0     1     0     1     0     0     0

^{Created on 2022-08-30 by the reprex package (v2.0.1)}

Test data

set.seed(2022)
m <- 10L
n <- 20L
probs <- seq(0.1, 0.9, length = n)
df1 <- matrix(nrow = m, ncol = n)
for(i in 1:n) {
  df1[, i] <- rbinom(m, 1, prob = probs[i])
}
df1 <- as.data.frame(df1)
row.names(df1) <- sprintf("mapd", as.integer(row.names(df1)))

^{Created on 2022-08-30 by the reprex package (v2.0.1)}