I want to create a variable (in this case "level") in a data frame which categorizes the numeric value of another variable ("classwork"). The table below (the one consisting variables "classwork" and "level") is just an illustration; it does not represent the real data. The real datasets are provided at the end of this question.
classwork | level |
---|---|
1 | Low |
1.5 | Low |
4 | High |
2 | Low |
5 | High |
I used Agglomerative Hierarchical Clustering to divide the "classwork" variable into two categories, i.e. "High" and "Low". The values included in the "High" category is stored in the data frame "group1" and the "Low" category in the data frame "group2". Therefore, I would like to request R:
to insert the string "High" for any value in classwork found in group1, and
to insert the string "Low" for any value in classwork found in group2.
This is what I already have (but did not seem to work as I expected). I prefer for both conditions to be included in one functions, but I don't mind if two functions are necessary.
replace(DATA.RH.TAM$C.level, DATA.RH.TAM$classwork == group1.C, "High")
replace(DATA.RH.TAM$C.level, DATA.RH.TAM$classwork == group2.C, "Low")
The followings are my datasets:
DATA.RH.TAM <- structure(list(classwork = c(4.714, 3.714, 4.143, 2.857, 4.143,
3.714, 3, 4, 3.429, 4.286, 3, 4.286, 3.571, 3.714, 3.714, 3.857,
3.429, 3.714, 3.571, 3.571, 4.143, 4.429, 3.857, 3.714, 2.429,
4.143, 4.286, 2.714, 3.143, 3.714, 3.857, 4.429, 4.571, 3.571,
3.286, 4, 4, 4, 4.143, 3.286, 3, 3.286, 3.571, 3.857, 4.143,
3.714, 4.286, 2.143, 4, 4, 1.571, 3.143, 3.571, 3.571, 4, 3.857,
3.286, 3, 3.143, 3.286, 3.857, 4.143, 4, 3.143, 3.857, 3.857,
4, 3.571, 4.571, 2.429, 3.429, 3.429, 3.429, 1.857, 3.571, 3,
2.143, 3.714, 4.286, 3.286, 4.857, 4.286, 3.429, 3.143, 3.857,
4.143, 4.286, 3.571, 3.429, 3.857, 3.571, 2.714, 3.714, 3, 3.857,
5, 3.571, 4.714, 2.286, 2.429, 3.286, 4.429, 3, 2.429, 2.857,
3.857, 3, 3.714, 2.286, 1.857, 3.714, 4.286, 4.143, 3.857, 4.143,
3.857, 3.714, 2.143, 3.714, 1.429, 2.429, 3.857, 3, 2, 3, 3,
3.714, 3, 4.143, 4, 4.429, 4.429, 4, 4, 4.429, 3.857, 3.571,
3.571, 4.143, 3.429, 3.143, 5, 3.286, 3.571, 3.286, 3.857, 4.571,
3, 2.714, 4, 2.429, 2.429, 4.429, 4.143), C.level = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), class = "data.frame", row.names = c(NA,
-154L))
group1 <- structure(list(classwork = c(4.714, 4.714, 4.857, 5, 5, 4.429,
4.429, 4.429, 4.429, 4.429, 4.429, 4.429, 4.571, 4.571, 4.571,
4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143,
4.143, 4.143, 4.143, 4.143, 4.286, 4.286, 4.286, 4.286, 4.286,
4.286, 4.286, 4.286, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3.857,
3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857,
3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857)), row.names = c(NA,
-66L), class = c("tbl_df", "tbl", "data.frame"))
group2 <- structure(list(classwork = c(3.714, 3.714, 3.714, 3.714, 3.714,
3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714,
3.714, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429,
3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571,
3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 2.857, 2.857, 2.714,
2.714, 2.714, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3.143, 3.143,
3.143, 3.143, 3.143, 3.143, 3.286, 3.286, 3.286, 3.286, 3.286,
3.286, 3.286, 3.286, 3.286, 2.429, 2.429, 2.429, 2.429, 2.429,
2.429, 2.429, 2.143, 2.143, 2.143, 2.286, 2.286, 1.571, 1.429,
1.857, 1.857, 2)), row.names = c(NA, -88L), class = c("tbl_df",
"tbl", "data.frame"))
I hope that makes sense. Thank you very much for reading my question everyone.
Best regards,
CodePudding user response:
This may helps
DATA.RH.TAM$C.level[DATA.RH.TAM$classwork %in% group1$classwork] <- "High"
DATA.RH.TAM$C.level[DATA.RH.TAM$classwork %in% group2$classwork] <- "Low"
head(DATA.RH.TAM)
classwork C.level
1 4.714 group1
2 3.714 group2
3 4.143 group1
4 2.857 group2
5 4.143 group1
6 3.714 group2