I have a dataset of friendships and characteristics of each individual, I'm trying to create variables that are if they match on the binary measures and what their absolute difference is for the continuous measures.
I can do this easily, but I was wondering if there is a different way to do it that is more streamlined than my method given that I have ~60 variables to do this with.
Sample Data:
dat <- read.table(text = "id.x id.y male.x smoke.x drink.x everfight.x grades.x male.y smoke.y drink.y everfight.y grades.y
1 6 0 2 4 1 3 0 2 1 0 2
2 7 0 2 4 0 5 0 2 3 1 4
3 8 1 4 4 1 2 0 4 2 1 1
4 9 0 2 3 1 2 0 3 2 0 1
5 10 1 2 4 0 4 1 4 1 0 4", header = TRUE)
Here's what I've done:
dat <- dat %>%
mutate(sex_match = case_when(male.x == male.y ~ 1,
TRUE ~ 0),
fight_match = case_when(everfight.x == everfight.y ~ 1,
TRUE ~ 0),
smoke_diff = abs(smoke.x - smoke.y),
drink_diff = abs(drink.x - drink.y),
grades_diff = abs(grades.x - grades.y))
This gives me exactly what I want:
id.x id.y male.x smoke.x drink.x everfight.x grades.x male.y smoke.y drink.y everfight.y grades.y sex_match fight_match smoke_diff drink_diff grades_diff
1 6 0 2 4 1 3 0 2 1 0 2 1 0 0 3 1
2 7 0 2 4 0 5 0 2 3 1 4 1 0 0 1 1
3 8 1 4 4 1 2 0 4 2 1 1 0 1 0 2 1
4 9 0 2 3 1 2 0 3 2 0 1 1 0 1 1 1
5 10 1 2 4 0 4 1 4 1 0 4 1 1 2 3 0
However, I'm wondering if there's a way to do this with a loop or apply that identifies corresponding vairables and create the matching and absolute difference new variables in the sample output above.
UPDATE
Ended up using most of what Jon answered with and one part of akrun, here's what worked best for me:
non_binary <- dat %>% select(., contains(".x")) %>%
select(., -id.x) %>%
select_if(~!all(. %in% 0:1)) %>%
rename_with(~str_remove(., '.x')) %>%
names()
dat %>%
pivot_longer(-c(id.x:id.y),
names_to = c("var", ".value"),
names_pattern = "(. ).(. )") %>%
mutate(match = if_else(var %in% non_binary, abs(x - y), 1L * (x == y))) %>%
mutate(col_name = paste(var, ifelse(var %in% non_binary, "diff", "match"), sep = "_")) %>%
select(-c(var:y)) %>%
pivot_wider(names_from = col_name, values_from = match)
Thanks to both of you!
CodePudding user response:
Here's a tidyr/dplyr approach. First I reshape to a long format with a row for each id/variable combination, with columns for each version. Then I can compare those for every pair at once, and reshape wide.
library(dplyr); library(tidyr)
non_binary <- c("smoke", "drink", "grades")
dat %>%
pivot_longer(-c(id.x:id.y),
names_to = c("var", ".value"),
names_pattern = "(. ).(. )") %>%
mutate(match = if_else(var %in% non_binary, abs(x - y), 1L * (x == y))) %>%
mutate(col_name = paste(var, ifelse(var %in% non_binary, "diff", "match"), sep = "_")) %>%
select(-c(var:y)) %>%
pivot_wider(names_from = col_name, values_from = match)
Result, which could be appended to original data:
# A tibble: 5 x 7
id.x id.y male_match smoke_diff drink_diff everfight_match grades_diff
<int> <int> <int> <int> <int> <int> <int>
1 1 6 1 0 3 0 1
2 2 7 1 0 1 0 1
3 3 8 0 0 2 1 1
4 4 9 1 1 1 0 1
5 5 10 1 2 3 1 0
CodePudding user response:
We can use tidyverse
with across
which can do this with dplyr/stringr
packages alone i.e. loop across
the .x
columns of 'male', 'everfight', then get
the value of the corresponding .y
columns to create the binary column, similarly do this on the other columns, and get the abs
olute difference. In the .names
, replace the column name by making use of str_replace
library(dplyr)
library(stringr)
dat %>%
mutate(across(c(male.x, everfight.x ),
~ (. == get(str_replace(cur_column(), 'x$', 'y'))),
.names = "{str_replace(.col, '.x', '_match')}"),
across(c(smoke.x, drink.x, grades.x),
~
abs(. - get(str_replace(cur_column(), 'x$', 'y'))),
.names = "{str_replace(.col, '.x', '_diff')}"))
-output
id.x id.y male.x smoke.x drink.x everfight.x grades.x male.y smoke.y drink.y everfight.y grades.y male_match everfight_match smoke_diff drink_diff grades_diff
1 1 6 0 2 4 1 3 0 2 1 0 2 1 0 0 3 1
2 2 7 0 2 4 0 5 0 2 3 1 4 1 0 0 1 1
3 3 8 1 4 4 1 2 0 4 2 1 1 0 1 0 2 1
4 4 9 0 2 3 1 2 0 3 2 0 1 1 0 1 1 1
5 5 10 1 2 4 0 4 1 4 1 0 4 1 1 2 3 0
Or may do this in a single across
as well
dat %>%
mutate(across(ends_with('.x'), ~ {
other <- get(str_replace(cur_column(), 'x$', 'y'))
if(all(. %in% c(0, 1)) ) (. == other) else abs(. - other)
}, .names = "{str_replace(.col, '.x', '_diff')}"))