In my data I have a vector of characters, where one of the characters (in this case, the letter P
) is repeated a certain number of times. For example, if I have the following information:
number <- 2
iterations <- 2
and a vector of characters:
df <- data.frame(var = c("P", "a", "b", "P", "d", "a", "k",
"P", "e", "q", "s", "P", "d", "v", "i", "j"))
We can see that P
is repeated 4 times. Or, to put it another way, P
is repeated number * iterations
times.
So in my code, I have some function that creates the data frame df
, it will create a sequence of strings that always start with P
. This happens number
of times (in this example 2)... and gets iterated by the iterations
number (in this example, also, 2). So that means P
appears number * iterations
times in my data frame.
Im trying to create a new column that will contain the iteration number. So, in my example, the first two P
's would be in iteration 1 and the second two would be in iteration 2. For example, my desired output would look something like:
> df
var iter
1 P 1
2 a 1
3 b 1
4 P 1
5 d 1
6 a 1
7 k 1
8 P 2
9 e 2
10 q 2
11 s 2
12 P 2
13 d 2
14 v 2
15 i 2
16 j 2
I hope what I said makes sense... I was finding it difficult to accurately word this problem
CodePudding user response:
We may use gl
to do this - replace
a vector of NA
s where the 'var' is 'P' with the gl
indexing and then use fill
to replace the NA
with the non-NA previous value
library(dplyr)
library(tidyr)
df %>%
mutate(iter = replace(rep(NA_integer_, n()), var == 'P',
as.integer(gl(sum(var == 'P'), 2, sum(var == 'P'))))) %>%
fill(iter)
-output
var iter
1 P 1
2 a 1
3 b 1
4 P 1
5 d 1
6 a 1
7 k 1
8 P 2
9 e 2
10 q 2
11 s 2
12 P 2
13 d 2
14 v 2
15 i 2
16 j 2