I know how to do this in excel, but am trying to translate into R and create a new column. In R I have a data frame called CleanData. I want to see how many times the value in each row of column A shows up in all of column B. In excel it would read like this:
=COUNTIF(B:B,A2)>0,C="Purple")
The second portion would be a next if / and statement. It would look like this in excel:
=IF(AND(COUNTIF(B:B,A2)>0,C="Purple"),"Yes", "No")
Anyone know where to start?
I have tried mutating and also this:
sum(CleanData$colA == CleanData$colB)
and am getting no values
CodePudding user response:
I think this will capture your if
/countif
scenario:
library(dplyr)
CleanData %>%
mutate(YesOrNo = case_when(Color != "Purple" ~ "No", is.na(LABEL1) | !nzchar(LABEL1) ~ "No", !LABEL1 %in% LABEL2 ~ "No", TRUE ~ "Yes"))
# LABEL1 LABEL2 Color YesOrNo
# 1 HELLO <NA> Purple Yes
# 2 <NA> HELLO!!! Blue No
# 3 HELLO$$ <NA> Purple Yes
# 4 <NA> HELLO Blue No
# 5 HELLOOO <NA> Purple Yes
# 6 <NA> <NA> Purple No
# 7 <NA> HELLOOO Blue No
# 8 <NA> HELLO$$ Blue No
# 9 <NA> HELLO Yellow No
Data
CleanData <- structure(list(LABEL1 = c("HELLO", NA, "HELLO$$", NA, "HELLOOO", NA, NA, NA, NA), LABEL2 = c(NA, "HELLO!!!", NA, "HELLO", NA, NA, "HELLOOO", "HELLO$$", "HELLO"), Color = c("Purple", "Blue", "Purple", "Blue", "Purple", "Purple", "Blue", "Blue", "Yellow")), class = "data.frame", row.names = c(NA, -9L))
or programmatically,
CleanData <- data.frame(LABEL1=c("HELLO",NA,"HELLO$$",NA,"HELLOOO",NA,NA,NA,NA), LABEL2=c(NA,"HELLO!!!",NA,"HELLO",NA,NA,"HELLOOO","HELLO$$","HELLO"),Color=c("Purple","Blue","Purple","Blue","Purple","Purple","Blue","Blue","Yellow"))
CodePudding user response:
You don't need any extra packages, here is a solution with the base R function ifelse
, which is a frequently very useful function you should learn. An example:
set.seed(7*11*13)
DF <- data.frame(cond=rnorm(100), X= sample(c("Yes","No"), 100, replace=TRUE))
with(DF, sum(ifelse( (cond>0)&(X=="Yes"), 1, 0)))