I have to make a series of operations over a subset of columns. I have a set of columns which measures the same thing to different parties A, B, and C:
id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C
So, in the example, var1_A var1_B var1_C refer to the same measurement for different parties. And var1_A, var2_A, var3_A refer to different variables for the same party A.
I would like to accomplish 2 things:
I need to create multiple data frames and merge the id with another dataframe, each one refers to one specific party. I wrote the code for each data frame individually, as the example below. The issue is that in the example it is simple. What complicates my life is that I have multiple datasets like df, and each of them contain information for multiple parties, and I end up with 50 lines of repetitive code. Is that a way to simplify?
df_A <- df %>% select(id var1_A var2_A var3_A)
df_A <- merge(df_A, df_merge, by="id")
df_B <- df %>% select(id var1_B var2_B var3_B)
df_B <- merge(df_B, df_merge, by="id")
df_C <- df %>% select(id var1_C var2_C var3_C)
df_C <- merge(df_C, df_merge, by="id")
The second thing I would like to accomplish is to change the variable name for df. I would like to change the variable name for all the columns that measure the same thing, but maintaining the party which it refers to. For example, say var1 refers to height, var2 refers to weight, and var3 refers to gender:
id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C
I would like to get something like:
id height_A height_B height_C weight_A weight_B weight_C gender_A gender_B gender_C
Is there a way to accomplish this with few lines of code? Or do I have to rename each of them individually (using rename command, for example)?
CodePudding user response:
This is similar to @thelatemail's comment (answer) above, but with a couple of extra subsequent steps, i.e. rename the columns, pivot the data to 'long' format, split the df into groups ("df_A", "df_B", "df_C"), pivot the data back to wide, and save the dfs to your global environment:
library(tidyverse)
library(purrr)
df <- data.frame(id = 1:10,
var1_A = runif(10),
var1_B = runif(10),
var1_C = runif(10),
var2_A = runif(10),
var2_B = runif(10),
var2_C = runif(10),
var3_A = runif(10),
var3_B = runif(10),
var3_C = runif(10))
list_of_dfs <- df %>%
rename_with(.cols = starts_with("var1"), ~gsub("var1", "height", .x)) %>%
rename_with(.cols = starts_with("var2"), ~gsub("var2", "weight", .x)) %>%
rename_with(.cols = starts_with("var3"), ~gsub("var3", "gender", .x)) %>%
pivot_longer(-id) %>%
mutate(group = case_when(
str_detect(name, "_A") ~ "df_A",
str_detect(name, "_B") ~ "df_B",
str_detect(name, "_C") ~ "df_C"
)) %>%
split(., .$group)
df_list <- map(list_of_dfs,
\(x) pivot_wider(x, names_from = name,
values_from = value) %>%
select(-group))
list2env(df_list, envir = .GlobalEnv)
#> <environment: R_GlobalEnv>
ls()
#> [1] "df" "df_A" "df_B" "df_C" "df_list"
#> [6] "list_of_dfs"
df_A
#> # A tibble: 10 × 4
#> id height_A weight_A gender_A
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0.417 0.693 0.320
#> 2 2 0.387 0.879 0.00590
#> 3 3 0.882 0.805 0.861
#> 4 4 0.611 0.246 0.336
#> 5 5 0.795 0.185 0.680
#> 6 6 0.274 0.00675 0.568
#> 7 7 0.722 0.950 0.757
#> 8 8 0.776 0.757 0.0457
#> 9 9 0.613 0.352 0.853
#> 10 10 0.0603 0.438 0.421
Created on 2022-10-05 by the reprex package (v2.0.1)
You can then merge/join the dfs as required. Hope this helps.
CodePudding user response:
A tidy way:
require(tidyverse)
#CREATE DATA
df <- data.frame(id = 1:10,
var1_A = runif(10),
var1_B = runif(10),
var1_C = runif(10),
var2_A = runif(10),
var2_B = runif(10),
var2_C = runif(10),
var3_A = runif(10),
var3_B = runif(10),
var3_C = runif(10))
df_merge<-data.frame(id = 1:10,
value=11:20)
#grabs current names
nam<-colnames(df)
#Create map of new names
new_names = c('var1'='height','var2'='weight','var3'='gender')
#replace the strings with new strings in map
nam <- str_replace_all(nam, new_names)
#reassign column names to dataframe
colnames(df)<-nam
# loop through all letters in list assign to variable
#pasted with "df" and the letter, selects columns ending with
# letter, merges with df_ids and returns the new subset of data
#to the assigned variable name
for (letter in c('A', "B", "C")){
assign(paste("df", letter, sep = '_'),
df%>%select(id, ends_with(letter))%>%
merge(df_merge, by='id'))
}