Rstudio: use .N in data frame-CodePudding

I created a small test dataframe. I want to fill a new column var3 with values. If in the corresponding row var2 = A, the number of A-values from column var2 should be specified. If the corresponding row var2 = B, the number of B values from column var2 is to be specified.

I imagine something like this:

var1 <- c(1,2,3,4,5)
var2 <- c("A","B","B","B","A")
df <- data.frame(var1,var2)

df$var3 <- 0
N_var3 <- df[.N, by = df$var2]

for (i in 1:nrow(df)) {
  if (df$var2 == "A") {
    df$var3[i] <- N_var3[1]
  }  else {
    df$var3[i] <-N_var3[2] 
  }
}

the result should be:

df$var3 <- c(2,3,3,3,2)

is there another way without having to use a loop?

Thanks in advance!

CodePudding user response：

I think this could be a general approach regardless of the number of variables in your var2 column:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(var3 = sum(.$var2 == var2))

# A tibble: 5 x 3
# Rowwise: 
   var1 var2   var3
  <dbl> <chr> <int>
1     1 A         2
2     2 B         3
3     3 B         3
4     4 B         3
5     5 A         2

CodePudding user response：

df %>% add_count(var2) is the easiest way.

I have to make another modification. I want to plot the df later and distinguish by color between two categories, which I add to the df with var4

var4 <- c("blue", "red", "red", "blue","blue")

If I now create a diagram, of course, no differences between "blue" and "red" show up, since these categories are not taken into account. Can I somehow include this modification?