I have the following dataset:
hairdf=data.frame(
id=c(1:4),
typedad=c("straight*","curly"),
colourdad=c("brown","black"),
typemom=c("curly","wavy*"),
colourmom=c("blonde","red"),
typekid1=c("wavy","mixed*"),
colourkid1=c("black","blonde"))
I want to create new columns that will look at hairtypes and give value 1 if the type of hair appears in "hairtype" columns without an asterisk and a value 2 if it appears with an asterisk (blank if it doesnt appear in that row). It should look like so:
id | typedad | colourdad | typemom | colourmom | typekid1 | colourkid1 | straight | curly | wavy | mixed |
---|---|---|---|---|---|---|---|---|---|---|
1 | striaght* | brown | curly | blonde | wavy | black | 2 | 1 | 1 | |
2 | curly | black | wavy* | red | mixed* | blonde | 1 | 2 | 2 |
My two issues are that all other examples use numeric values and all other examples have the columns of interest located next to each other. I need code that looks to match strings in columns that can be located anywhere in the dataframe. I have tried the following:
straight<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="straight", 1
ifelse(.=="straight*",2, ""
))))
curly<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="curly", 1
ifelse(.=="curly*",2, ""
wavy<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="wavy", 1
ifelse(.=="wavy*",2, ""
))))
mixed<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="mixed", 1
ifelse(.=="mixed*",2, ""
))))
But I'm not sure if this code even makes sense. Also, this will be tedious as I have way more hairtypes, so any suggestions to make it easier would be appreciated as well!! Thankyou!!!
CodePudding user response:
This is not the more efficient answer, neither the more general solution, but may satisfy a solution:
#create columns
st <- rep(NA,nrow(hairdf));
cur <- rep(NA,nrow(hairdf));
wav <- rep(NA,nrow(hairdf));
mix <- rep(NA,nrow(hairdf));
#join and define words
hairdf <- cbind(hairdf,st,cur,wav,mix);
words <- c("straight","curly","wavy","mixed");
words_ast <- paste(words,"*",sep=""); #just get the "*" words
#make a loop according to positions of columns st,cur,wav,mix
for (j in 1:length(words_ast)){ #let's see if we can evaluate 2 in words_ast
for (i in c(2,3,4)){ #but only in columns we selected
a <- subset(hairdf,hairdf[,i]==words_ast[j]) #subset columns which satisfay condition. [Note that this can be written as hairdf %>% subset(.[,i]==words_ast[j]) ]
hairdf[row.names(a),7 j] <- 2 #replace value from column 8
}
}
#repeat process for "words"
for (j in 1:length(words)){
for (i in c(2,3,4)){
a <- subset(hairdf,hairdf[,i]==words[j])
hairdf[row.names(a),7 j] <- 1
}
}
This should allow you to get the expected result. Alternatively, you can use the assign()
function, i.e
assign(x,value=1)
where x is each element in words.
So in a loop:
assign(words[n],value=1) ; assign(words_ast[n],value=2)