Home > Mobile >  Add suffix to some columns in 1st dataframe based on values in 2 columns in 2nd dataframe
Add suffix to some columns in 1st dataframe based on values in 2 columns in 2nd dataframe

Time:04-29

Task1: I want to add a suffix to some columns in the 1st dataframe based on the values in 2 columns in a second dataframe. My psuedocode would be:

if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select column names in case.df dataframe.

Task2: Next, I want to add the same suffix to the cells in the 2nd dataframe column named columnNames, again based on the values in the 2 columns in dict.df. My psuedocode would be:

if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select cell contents in dict.df$columnName.

Currently have these 2 dataframes with thousands of column names to be renamed (so hardcoding column names in code is impractical):

agegen <- c('15m','34f','56m','49f','28m','37f')
race <- c('w','h','a','w','a','o')
eth <- c('-','h','-','-','h','-')
disp1 <- c('witness was violent', 'officer arrested suspect', 'never responded address', 'officer arrested suspect', 'suspect ran away', 'suspect ran away')
disp2<- c('new client', 'revisit to address', 'parent custody', 'county jail', 'drumset missing', 'new lockup')
disp3<- c('violent witness', 'bonded out', 'future court date', 'new client', 'weapon charge', 'girlfriend suspect')
disp4<- c('violent witness', 'knife in kitchen', 'suspect at precinct', 'new client', '3 people involved', 'girlfriend suspect')
case.df <- data.frame(agegen,race,eth,disp1,disp2,disp3,disp4)
case.df

#   agegen race eth                    disp1              disp2              disp3               disp4
# 1    15m    w   -      witness was violent         new client    violent witness     violent witness
# 2    34f    h   h officer arrested suspect revisit to address         bonded out    knife in kitchen
# 3    56m    a   -  never responded address     parent custody  future court date suspect at precinct
# 4    49f    w   - officer arrested suspect        county jail         new client          new client
# 5    28m    a   h         suspect ran away    drumset missing      weapon charge   3 people involved
# 6    37f    o   -         suspect ran away         new lockup girlfriend suspect  girlfriend suspect

columnNames <- c('agegen','race','eth','disp1','disp2','disp3','disp4')
timing <- c('0t','6m','0t','3t','3t','0t','0t')
source <- c('cre','cre','aft','cre1','aft','cre','aft')
dict.df <- data.frame(columnNames,timing,source)
dict.df

#   columnNames timing source
# 1      agegen     0t    cre
# 2        race     6m    cre
# 3         eth     0t    aft
# 4       disp1     3t   cre1
# 5       disp2     3t    aft
# 6       disp3     0t    cre
# 7       disp4     0t    aft

Want these 2 dataframes as a result (Do not want to rename column names unless either "cre" or "cre1" is present in the corresponding row of dict.df$source column):

#   agegen_0t race_6m eth                 disp1_3t              disp2           disp3_0t               disp4
# 1       15m       w   -      witness was violent         new client    violent witness     violent witness
# 2       34f       h   h officer arrested suspect revisit to address         bonded out    knife in kitchen
# 3       56m       a   -  never responded address     parent custody  future court date suspect at precinct
# 4       49f       w   - officer arrested suspect        county jail         new client          new client
# 5       28m       a   h         suspect ran away    drumset missing      weapon charge   3 people involved
# 6       37f       o   -         suspect ran away         new lockup girlfriend suspect  girlfriend suspect

#   columnNames timing source
# 1   agegen_0t     0t    cre
# 2     race_6m     6m    cre
# 3         eth     0t    aft
# 4    disp1_3t     3t   cre1
# 5       disp2     3t    aft
# 6    disp3_0t     0t    cre
# 7       disp4     0t    aft

UPDATE I found some "0" values (not "0t")in the timing column. For those I want to assign the suffix "_start" to columnNames in dict.df, and add "_start" as suffix fo rthe relevant column names in case.df. Basically, something like if dict.df$timing == "0" then dict.df$columnNames <- "_start" and if dict.df$timing == "0" then all relevant columns in case.df get suffix <-"_start" Any ideas to tweak langtang's code to make this happen in both dataframes?

CodePudding user response:

You can do this:

  1. create nn to hold the new names
nn = dict.df %>%
  filter(source %in% c("cre", "cre1")) %>%
  mutate(newn = paste0(columnNames,"_",timing)) %>%
  select(columnNames, newn)
  1. Update dict.df
dict.df <- 
  dict.df %>%
  mutate(columnNames = if_else(source %in% c("cre", "cre1"),paste0(columnNames,"_",timing), columnNames))
  1. Update case.df
case.df <- case.df %>% rename_with(~nn$newn, .cols=nn$columnNames)
  • Related