I have some pre-post data (in R) for which some individuals only have a value for time 1, like episode "2" here:
episode <- c('1','1','2','3','3')
score <- c('44','12','37','40','9')
df <- data.frame(episode,score)
For all the records where there is only "pre" data (1 score per episode), I would like to use R (dplyr preferred) to copy that record and then indicate for all records which is the pre and which is the post. So the final should look something like:
episode | score | time |
---|---|---|
1 | 44 | 1 |
1 | 12 | 2 |
2 | 37 | 1 |
2 | 37 | 2 |
3 | 40 | 1 |
3 | 9 | 2 |
Thanks!
CodePudding user response:
Here is one option - create a column of frequency count by 'episode' and if the value is 1, then add 1 on the logical (n == 1
) and replicate the rows with uncount
library(dplyr)
library(tidyr)
df %>%
add_count(episode) %>%
mutate(n = (n == 1) 1) %>%
uncount(n) %>%
group_by(episode) %>%
mutate(time = row_number()) %>%
ungroup
-output
# A tibble: 6 × 3
episode score time
<chr> <chr> <int>
1 1 44 1
2 1 12 2
3 2 37 1
4 2 37 2
5 3 40 1
6 3 9 2
Or create the 'time' column first and then use complete
with fill
df %>%
group_by(episode) %>%
mutate(time = row_number()) %>%
ungroup %>%
complete(episode, time) %>%
fill(score)