Task1: I want to add a suffix to some columns in the 1st dataframe based on the values in 2 columns in a second dataframe. My psuedocode would be:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select column names in case.df dataframe.
Task2: Next, I want to add the same suffix to the cells in the 2nd dataframe column named columnNames, again based on the values in the 2 columns in dict.df. My psuedocode would be:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select cell contents in dict.df$columnName.
Currently have these 2 dataframes with thousands of column names to be renamed (so hardcoding column names in code is impractical):
agegen <- c('15m','34f','56m','49f','28m','37f')
race <- c('w','h','a','w','a','o')
eth <- c('-','h','-','-','h','-')
disp1 <- c('witness was violent', 'officer arrested suspect', 'never responded address', 'officer arrested suspect', 'suspect ran away', 'suspect ran away')
disp2<- c('new client', 'revisit to address', 'parent custody', 'county jail', 'drumset missing', 'new lockup')
disp3<- c('violent witness', 'bonded out', 'future court date', 'new client', 'weapon charge', 'girlfriend suspect')
disp4<- c('violent witness', 'knife in kitchen', 'suspect at precinct', 'new client', '3 people involved', 'girlfriend suspect')
case.df <- data.frame(agegen,race,eth,disp1,disp2,disp3,disp4)
case.df
# agegen race eth disp1 disp2 disp3 disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
columnNames <- c('agegen','race','eth','disp1','disp2','disp3','disp4')
timing <- c('0t','6m','0t','3t','3t','0t','0t')
source <- c('cre','cre','aft','cre1','aft','cre','aft')
dict.df <- data.frame(columnNames,timing,source)
dict.df
# columnNames timing source
# 1 agegen 0t cre
# 2 race 6m cre
# 3 eth 0t aft
# 4 disp1 3t cre1
# 5 disp2 3t aft
# 6 disp3 0t cre
# 7 disp4 0t aft
Want these 2 dataframes as a result (Do not want to rename column names unless either "cre" or "cre1" is present in the corresponding row of dict.df$source column):
# agegen_0t race_6m eth disp1_3t disp2 disp3_0t disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
# columnNames timing source
# 1 agegen_0t 0t cre
# 2 race_6m 6m cre
# 3 eth 0t aft
# 4 disp1_3t 3t cre1
# 5 disp2 3t aft
# 6 disp3_0t 0t cre
# 7 disp4 0t aft
UPDATE
I found some "0" values (not "0t")in the timing column. For those I want to assign the suffix "_start" to columnNames in dict.df, and add "_start" as suffix fo rthe relevant column names in case.df. Basically, something like if dict.df$timing == "0" then dict.df$columnNames <- "_start"
and if dict.df$timing == "0" then all relevant columns in case.df get suffix <-"_start"
Any ideas to tweak langtang
's code to make this happen in both dataframes?
CodePudding user response:
You can do this:
- create
nn
to hold the new names
nn = dict.df %>%
filter(source %in% c("cre", "cre1")) %>%
mutate(newn = paste0(columnNames,"_",timing)) %>%
select(columnNames, newn)
- Update dict.df
dict.df <-
dict.df %>%
mutate(columnNames = if_else(source %in% c("cre", "cre1"),paste0(columnNames,"_",timing), columnNames))
- Update case.df
case.df <- case.df %>% rename_with(~nn$newn, .cols=nn$columnNames)