Home > Blockchain >  How to fill a column by group with sampled row numbers according to n per group
How to fill a column by group with sampled row numbers according to n per group

Time:02-06

I am working with a dataframe in R. I have groups stated by column Group1. I need to create a new column named sampled where I need to fill with a specific value after using sample per group from 1 to each number of rows per group. Here is the data I have:

library(tidyverse)
#Data
dat <- data.frame(Group1=sample(letters[1:3],15,replace = T))

Then dat looks like this:

dat
   Group1
1       b
2       a
3       a
4       c
5       c
6       c
7       a
8       b
9       c
10      b
11      a
12      b
13      c
14      c
15      c

In order to get the N per group, we do this:

#Code
dat %>% 
  arrange(Group1) %>%
  group_by(Group1) %>%
  mutate(N=n())

Which produces:

# A tibble: 15 x 2
# Groups:   Group1 [3]
   Group1     N
   <chr>  <int>
 1 a          4
 2 a          4
 3 a          4
 4 a          4
 5 b          4
 6 b          4
 7 b          4
 8 b          4
 9 c          7
10 c          7
11 c          7
12 c          7
13 c          7
14 c          7
15 c          7

What I need to do is next. I have the N per group, so I have to create a sample of 3 numbers from 1:N. In the case of group a having N=4 it would be sample(1:4,3) which produces [1] 2 4 3. With this in the group a I need that rows belonging to sampled values must be filled with 999. So for first group we would have:

   Group1     N sampled
   <chr>  <int>   <int>
 1 a          4    NA
 2 a          4    999
 3 a          4    999
 4 a          4    999

And then the same for the rest of groups. In this way using sample we will have random values per group. Is that possible to do using dplyr or tidyverse. Many thanks!

CodePudding user response:

You could try:

set.seed(3242)

library(dplyr)

dat %>%
  arrange(Group1) %>%
  add_count(Group1, name = 'N') %>%
  group_by(Group1) %>%
  mutate(
    sampled = case_when(
      row_number() %in% sample(1:n(), 3L) ~ 999L,
      TRUE ~ NA_integer_
    )
  )

Output:

# A tibble: 15 × 3
# Groups:   Group1 [3]
   Group1     N sampled
   <chr>  <int>   <int>
 1 a          4     999
 2 a          4     999
 3 a          4      NA
 4 a          4     999
 5 b          4     999
 6 b          4     999
 7 b          4     999
 8 b          4      NA
 9 c          7      NA
10 c          7     999
11 c          7      NA
12 c          7     999
13 c          7      NA
14 c          7      NA
15 c          7     999
  • Related