I have subcategorized and categorized labels in excel but I want to make it reproducable so I want to convert it into R code.
I have a df containing 631 rows of which the first 15 rows look like this.
IV_label Subcategory Category
<chr> <chr> <chr>
1 light conditions time of day exogenous
2 vital status victim characteristics human involvement
3 road type road type exogenous
4 reserve density workload police discretion
5 road type road type exogenous
6 surface type road type exogenous
7 surface characteristic road type exogenous
8 light conditions time of day exogenous
9 light conditions time of day exogenous
10 weather weather type exogenous
11 weather weather type exogenous
12 weather weather type exogenous
13 day of the week day of the week exogenous
14 amount of lanes road type exogenous
15 amount of lanes road type exogenous
I want to be able to add the following to my R code without having to construct the lists myself:
time of day <- list(light conditions, ...)
victim characteristics <- list(vital status, ...)
road type <- list(road type, surface type, surface characteristics, amount of lanes, ...) (# notice road type is include only once!)
workload <- list(reserve density, ...)
weather type <- list(weather, ...)
day of the week <- list(day of the week, ...)
exogenous <- list(time of day, road type, weather type, day of the week)
human involvement <- list(victim characteristics)
police discretion <- list(workload)
I understand that I will need to boilerplate this part myself:
time of day <- list(
victim characteristics <- list(
road type <- list(
workload <- list(
weather type <- list(
day of the week <- list(
exogenous <- list(
human involvement <- list(
police discretion <- list(
But I hope to be able to copy the unique values from the console and just past the them into the above boilerplate.
CodePudding user response:
Here I am considering an edge any pair of terms appearing in the same row, in two consecutive columns. I am using the adjacency matrix adj
to keep track of the edges and then reconstruct the graph as a named list:
library(purrr)
df <- data.frame(IV_label = c(
"light conditions","vital status","road type",
"reserve density","road type","surface type",
"surface characteristic","light conditions","light conditions",
"weather","weather","weather",
"day of the week","amount of lanes","amount of lanes"),
Subcategory = c(
"time of day","victim characteristics","road type",
"workload","road type","road type",
"road type","time of day","time of day",
"weather type","weather type","weather type",
"day of the week","road type","road type"),
Category = c(
"exogenous","human involvement","exogenous",
"police discretion","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous"))
names <- c("IV_label", "Subcategory", "Category") |>
purrr::map(~pull(df, .x)) |>
purrr::reduce(union)
## adjacency matrix
adj <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
adj[cbind(df[,2], df[,1])] <- 1
adj[cbind(df[,3], df[,2])] <- 1
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
Output:
$`road type`
[1] "road type" "surface type" "surface characteristic"
[4] "amount of lanes"
$`day of the week`
[1] "day of the week"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"
You probably may want to unset the diagonal of adj
to avoid self referencing edges:
adj[row(adj) == col(adj)] <- 0
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
output:
$`road type`
[1] "surface type" "surface characteristic" "amount of lanes"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"