Create a new column that has total of unique value in a new column in r-CodePudding

I want to create a new column, which contains the totals count of a unique instance of another column.

x <- c("1", "1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "5", "6", "6", "6")
y <- c("Y", "Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y")
df <- data.frame(x, y)

What I want is the following:

#    x     y         z
#
#    1     Y         4
#    1     Y         4
#    1     Y         4
#    1     Y         4
#    2     N         3
#    2     N         3
#    2     Y         3
#    3     Y         3
#    3     Y         3
#    3     Y         3
#    4     Y         2
#    4     Y         2
#    5     Y         1
#    6     N         3
#    6     Y         3
#    6     Y         3

I have done the following script but it is convuluted.

library(plyr)
library(dplyr)
library(purrr)
library(tidyverse)
library(ggtext)
library(stringr)
x <- c("1", "1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "5", "6", "6", "6")
y <- c("Y", "Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y")
df <- data.frame(x, y) 
unique_count <- as.data.frame(table(df$x))
colnames(unique_count)[1] <- "x"
colnames(unique_count)[2] <- "z"
df <- purrr::reduce(list(df,unique_count), dplyr::left_join, by = 'x')

CodePudding user response：

A possible solution:

library(dplyr)

df %>% 
  add_count(x, name = "z")

#>    x y z
#> 1  1 Y 4
#> 2  1 Y 4
#> 3  1 Y 4
#> 4  1 Y 4
#> 5  2 N 3
#> 6  2 N 3
#> 7  2 Y 3
#> 8  3 Y 3
#> 9  3 Y 3
#> 10 3 Y 3
#> 11 4 Y 2
#> 12 4 Y 2
#> 13 5 Y 1
#> 14 6 N 3
#> 15 6 Y 3
#> 16 6 Y 3

CodePudding user response：

This is exactly the same as @Pauls answer, but just with other words and one line more!

library(dplyr)

df %>% 
  group_by(x) %>% 
  mutate(z = n())

   x     y         z
   <chr> <chr> <int>
 1 1     Y         4
 2 1     Y         4
 3 1     Y         4
 4 1     Y         4
 5 2     N         3
 6 2     N         3
 7 2     Y         3
 8 3     Y         3
 9 3     Y         3
10 3     Y         3
11 4     Y         2
12 4     Y         2
13 5     Y         1
14 6     N         3
15 6     Y         3
16 6     Y         3

CodePudding user response：

This can be answered by the following post: dplyr: put count occurrences into new variable

You need to group your data by variable x and y and count the occurences of these combinations. With dplyr a solution would then be:

library(dplyr)

x <- c("1", "1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "5", "6", "6", "6")
y <- c("Y", "Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y")
df <- data.frame(x, y)

df %>% group_by(x,y) %>% mutate(z = n())

Output:

# A tibble: 16 × 3
# Groups:   x, y [8]
   x     y     z
   <chr> <chr> <int>
 1 1     Y         4
 2 1     Y         4
 3 1     Y         4
 4 1     Y         4
 5 2     N         2
 6 2     N         2
 7 2     Y         1
 8 3     Y         3
 9 3     Y         3
10 3     Y         3
11 4     Y         2
12 4     Y         2
13 5     Y         1
14 6     N         1
15 6     Y         2
16 6     Y         2