Creating loop to count the number of unique values in column based on values in another column-CodePudding

So, for example, I have the following dataframe, data:

col1	col2
1	5
1	5
1	3
2	10
2	11
3	11

Now, I want to make a new column, col3, which gives me the number of unique values in col2 for every grouping in col1.

So far, I have the following code:

length(unique(data$col2[data$col1 == 1]))

Which would here return the number 2.

However, I'm having a hard time making a loop that goes through all the values in col1 to create the new column, col3.

CodePudding user response：

You want the counts for every row, so using a for loop you would do

data$col3 <- NA_real_
for (i in seq_len(nrow(data))) {
  data$col3[i] <- length(unique(data$col2[data$col1 == data$col1[i]]))
}
data
#   col1 col2 col3
# 1    1    5    2
# 2    1    5    2
# 3    1    3    2
# 4    2   10    2
# 5    2   11    2
# 6    3   11    1

However, using for loops in R is mostly inefficient, and in this case we can use the grouping function ave which comes with R.

data <- transform(data, col3=ave(col2, col1, FUN=\(x) length(unique(x))))
data
#   col1 col2 col3
# 1    1    5    2
# 2    1    5    2
# 3    1    3    2
# 4    2   10    2
# 5    2   11    2
# 6    3   11    1

Data:

data <- structure(list(col1 = c(1L, 1L, 1L, 2L, 2L, 3L), col2 = c(5L, 
5L, 3L, 10L, 11L, 11L)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response：

We can use n_distinct after grouping

library(dplyr)
data <- data %>%
   group_by(col1) %>%
   mutate(col3 = n_distinct(col2)) %>%
   ungroup

-output

data
# A tibble: 6 × 3
   col1  col2  col3
  <int> <int> <int>
1     1     5     2
2     1     5     2
3     1     3     2
4     2    10     2
5     2    11     2
6     3    11     1

Or with data.table

library(data.table)
setDT(data)[, col3 := uniqueN(col2), col1]

data

data <- structure(list(col1 = c(1L, 1L, 1L, 2L, 2L, 3L), col2 = c(5L, 
5L, 3L, 10L, 11L, 11L)), class = "data.frame", row.names = c(NA, 
-6L))