I created a small test dataframe. I want to fill a new column var3 with values. If in the corresponding row var2 = A, the number of A-values from column var2 should be specified. If the corresponding row var2 = B, the number of B values from column var2 is to be specified.
I imagine something like this:
var1 <- c(1,2,3,4,5)
var2 <- c("A","B","B","B","A")
df <- data.frame(var1,var2)
df$var3 <- 0
N_var3 <- df[.N, by = df$var2]
for (i in 1:nrow(df)) {
if (df$var2 == "A") {
df$var3[i] <- N_var3[1]
} else {
df$var3[i] <-N_var3[2]
}
}
the result should be:
df$var3 <- c(2,3,3,3,2)
is there another way without having to use a loop?
Thanks in advance!
CodePudding user response:
I think this could be a general approach regardless of the number of variables in your var2
column:
library(dplyr)
df %>%
rowwise() %>%
mutate(var3 = sum(.$var2 == var2))
# A tibble: 5 x 3
# Rowwise:
var1 var2 var3
<dbl> <chr> <int>
1 1 A 2
2 2 B 3
3 3 B 3
4 4 B 3
5 5 A 2
CodePudding user response:
df %>% add_count(var2)
is the easiest way.
I have to make another modification. I want to plot the df later and distinguish by color between two categories, which I add to the df with var4
var4 <- c("blue", "red", "red", "blue","blue")
If I now create a diagram, of course, no differences between "blue" and "red" show up, since these categories are not taken into account. Can I somehow include this modification?