Home > other >  Replace some values in a dataframe if they are included in another dataframe with strings
Replace some values in a dataframe if they are included in another dataframe with strings

Time:11-22

I want to create a variable (in this case "level") in a data frame which categorizes the numeric value of another variable ("classwork"). The table below (the one consisting variables "classwork" and "level") is just an illustration; it does not represent the real data. The real datasets are provided at the end of this question.

classwork level
1 Low
1.5 Low
4 High
2 Low
5 High

I used Agglomerative Hierarchical Clustering to divide the "classwork" variable into two categories, i.e. "High" and "Low". The values included in the "High" category is stored in the data frame "group1" and the "Low" category in the data frame "group2". Therefore, I would like to request R:

  1. to insert the string "High" for any value in classwork found in group1, and

  2. to insert the string "Low" for any value in classwork found in group2.

This is what I already have (but did not seem to work as I expected). I prefer for both conditions to be included in one functions, but I don't mind if two functions are necessary.

replace(DATA.RH.TAM$C.level, DATA.RH.TAM$classwork == group1.C, "High")
replace(DATA.RH.TAM$C.level, DATA.RH.TAM$classwork == group2.C, "Low")

The followings are my datasets:

DATA.RH.TAM <- structure(list(classwork = c(4.714, 3.714, 4.143, 2.857, 4.143, 
                             3.714, 3, 4, 3.429, 4.286, 3, 4.286, 3.571, 3.714, 3.714, 3.857, 
                             3.429, 3.714, 3.571, 3.571, 4.143, 4.429, 3.857, 3.714, 2.429, 
                             4.143, 4.286, 2.714, 3.143, 3.714, 3.857, 4.429, 4.571, 3.571, 
                             3.286, 4, 4, 4, 4.143, 3.286, 3, 3.286, 3.571, 3.857, 4.143, 
                             3.714, 4.286, 2.143, 4, 4, 1.571, 3.143, 3.571, 3.571, 4, 3.857, 
                             3.286, 3, 3.143, 3.286, 3.857, 4.143, 4, 3.143, 3.857, 3.857, 
                             4, 3.571, 4.571, 2.429, 3.429, 3.429, 3.429, 1.857, 3.571, 3, 
                             2.143, 3.714, 4.286, 3.286, 4.857, 4.286, 3.429, 3.143, 3.857, 
                             4.143, 4.286, 3.571, 3.429, 3.857, 3.571, 2.714, 3.714, 3, 3.857, 
                             5, 3.571, 4.714, 2.286, 2.429, 3.286, 4.429, 3, 2.429, 2.857, 
                             3.857, 3, 3.714, 2.286, 1.857, 3.714, 4.286, 4.143, 3.857, 4.143, 
                             3.857, 3.714, 2.143, 3.714, 1.429, 2.429, 3.857, 3, 2, 3, 3, 
                             3.714, 3, 4.143, 4, 4.429, 4.429, 4, 4, 4.429, 3.857, 3.571, 
                             3.571, 4.143, 3.429, 3.143, 5, 3.286, 3.571, 3.286, 3.857, 4.571, 
                             3, 2.714, 4, 2.429, 2.429, 4.429, 4.143), C.level = c(NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                                                                                   NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                     -154L))
group1 <- structure(list(classwork = c(4.714, 4.714, 4.857, 5, 5, 4.429, 
                             4.429, 4.429, 4.429, 4.429, 4.429, 4.429, 4.571, 4.571, 4.571, 
                             4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 4.143, 
                             4.143, 4.143, 4.143, 4.143, 4.286, 4.286, 4.286, 4.286, 4.286, 
                             4.286, 4.286, 4.286, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3.857, 
                             3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 
                             3.857, 3.857, 3.857, 3.857, 3.857, 3.857, 3.857)), row.names = c(NA, 
                                                                                              -66L), class = c("tbl_df", "tbl", "data.frame"))
group2 <- structure(list(classwork = c(3.714, 3.714, 3.714, 3.714, 3.714, 
                             3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 3.714, 
                             3.714, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429, 3.429, 
                             3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 
                             3.571, 3.571, 3.571, 3.571, 3.571, 3.571, 2.857, 2.857, 2.714, 
                             2.714, 2.714, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3.143, 3.143, 
                             3.143, 3.143, 3.143, 3.143, 3.286, 3.286, 3.286, 3.286, 3.286, 
                             3.286, 3.286, 3.286, 3.286, 2.429, 2.429, 2.429, 2.429, 2.429, 
                             2.429, 2.429, 2.143, 2.143, 2.143, 2.286, 2.286, 1.571, 1.429, 
                             1.857, 1.857, 2)), row.names = c(NA, -88L), class = c("tbl_df", 
                                                                                   "tbl", "data.frame"))

I hope that makes sense. Thank you very much for reading my question everyone.

Best regards,

CodePudding user response:

This may helps

DATA.RH.TAM$C.level[DATA.RH.TAM$classwork %in% group1$classwork] <- "High"
DATA.RH.TAM$C.level[DATA.RH.TAM$classwork %in% group2$classwork] <- "Low"
head(DATA.RH.TAM)

  classwork C.level
1     4.714  group1
2     3.714  group2
3     4.143  group1
4     2.857  group2
5     4.143  group1
6     3.714  group2
  • Related