I am working with a dataframe in R
. I have groups stated by column Group1
. I need to create a new column named sampled
where I need to fill with a specific value after using sample
per group from 1 to each number of rows per group. Here is the data I have:
library(tidyverse)
#Data
dat <- data.frame(Group1=sample(letters[1:3],15,replace = T))
Then dat
looks like this:
dat
Group1
1 b
2 a
3 a
4 c
5 c
6 c
7 a
8 b
9 c
10 b
11 a
12 b
13 c
14 c
15 c
In order to get the N
per group, we do this:
#Code
dat %>%
arrange(Group1) %>%
group_by(Group1) %>%
mutate(N=n())
Which produces:
# A tibble: 15 x 2
# Groups: Group1 [3]
Group1 N
<chr> <int>
1 a 4
2 a 4
3 a 4
4 a 4
5 b 4
6 b 4
7 b 4
8 b 4
9 c 7
10 c 7
11 c 7
12 c 7
13 c 7
14 c 7
15 c 7
What I need to do is next. I have the N
per group, so I have to create a sample of 3 numbers from 1:N
. In the case of group a
having N=4
it would be sample(1:4,3)
which produces [1] 2 4 3
. With this in the group a
I need that rows belonging to sampled values must be filled with 999
. So for first group we would have:
Group1 N sampled
<chr> <int> <int>
1 a 4 NA
2 a 4 999
3 a 4 999
4 a 4 999
And then the same for the rest of groups. In this way using sample
we will have random values per group. Is that possible to do using dplyr
or tidyverse
. Many thanks!
CodePudding user response:
You could try:
set.seed(3242)
library(dplyr)
dat %>%
arrange(Group1) %>%
add_count(Group1, name = 'N') %>%
group_by(Group1) %>%
mutate(
sampled = case_when(
row_number() %in% sample(1:n(), 3L) ~ 999L,
TRUE ~ NA_integer_
)
)
Output:
# A tibble: 15 × 3
# Groups: Group1 [3]
Group1 N sampled
<chr> <int> <int>
1 a 4 999
2 a 4 999
3 a 4 NA
4 a 4 999
5 b 4 999
6 b 4 999
7 b 4 999
8 b 4 NA
9 c 7 NA
10 c 7 999
11 c 7 NA
12 c 7 999
13 c 7 NA
14 c 7 NA
15 c 7 999