Home > database >  How can I rename factors based on the column names of another data frame?
How can I rename factors based on the column names of another data frame?

Time:12-01

I have a column in a dataframe holding subjects:

sub <- c("A", "A", "B", "C", "C", "C", "D", "E", "F", "F")
subjects <- data.frame(sub)

I have another data frame containing columns of subjects (where subjects are only found in one column):

one <- c("A", "C", "F")
two <- c("B", "D", NA)
three <- c("E", NA, NA)
newsubjects <- data.frame(one, two, three)

I'm wanting to rename the subjects in the first dataframe to the column name found in the second dataframe corresponding to that subject.

So for example, I want the A, C, and F subjects in the first dataframe to be renamed 'one'. Doing this manually would take a long time so I'm hoping theres a way to use the columns in the second data frame to do this.

I've tried a bunch of stuff with forcats::fct_recode and levels but nothing works because I'm not using these functions correctly. Eg IIRC one of my attempts looked something like this:

subjects %>%
      mutate(new_var = forcats::fct_recode(sub,
            !!! setNames(as.character(subjects$sub), newsubjects$one)))

Which I know is completely wrong. Part of the problem is it's difficult fo me to articulate my problem in a way that returns relevant search results. Thank you for any help you can provide, I appreciate it.

CodePudding user response:

Using purrr::map(), derive a list pairing column names with values from newsubjects. Then unpack this inside forcats::fct_collapse() to recode values in subjects.

library(purrr)
library(forcats)

new_ids <- map(newsubjects, ~ .x[!is.na(.x)])

subjects$sub <- fct_collapse(subjects$sub, !!!new_ids)

subjects
     sub
1    one
2    one
3    two
4    one
5    one
6    one
7    two
8  three
9    one
10   one

CodePudding user response:

If you reshape newsubjects longer, you could join the two tables:

library(tidyverse)
subjects %>%
  left_join(newsubjects %>% 
            pivot_longer(everything(), names_to = "new_sub", values_to = "sub")) 

Joining, by = "sub"
   sub new_sub
1    A     one
2    A     one
3    B     two
4    C     one
5    C     one
6    C     one
7    D     two
8    E   three
9    F     one
10   F     one

CodePudding user response:

On the basis of equal length in one, two, three you could also create a lookup

library(dplyr)

sub <- c("A", "A", "B", "C", "C", "C", "D", "E", "F", "F")
subjects <- data.frame(sub)

one <- c("A", "C", "F")
two <- c("B", "D", NA)
three <- c("E", NA, NA)

additions <- c(one, two, three)

lookup <- data.frame(
  sub = additions %>% unlist(), 
  value = rep(1:length(additions), each=length(additions[[1]])))

subjects %>% inner_join(lookup) %>% select(value)

CodePudding user response:

In base R:

gsub("\\d", "", names(unlist(newsubjects))[match(subjects$sub, unlist(newsubjects))])
  • Related