I am trying to use coalesce() to produce one row per participant that has their name and their score. Participants had 3 opportunities to fill in their data, and most only came in once (and those that came in multiple times always put in the same data). So my data looks like:
library(dplyr)
test_dataset <- tibble(name = c("justin", "justin", "justin", "corey", "corey", "corey", "sib", "sib", "sib", "kate", "kate", "kate"),
score1 = c(NA_real_, NA_real_, 1, 2, NA_real_, NA_real_, 2, NA_real_, 2, NA_real_, NA_real_ , NA_real_),
score2 = c(NA_real_, 7, NA_real_, 5, NA_real_, NA_real_, 9, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_))
And I want it to look like:
library(dplyr)
answer <- tibble(name = c("justin", "corey", "sib", "kate"),
score1_true = c(1, 2, 2, NA),
score2_true = c(7, 5, 9, NA))
I've tried the below solution, which does give me the 'true' score, but it's spread out over 12 rows (3 rows per person) instead of 4 (one per person):
library(dplyr)
test_dataset %>%
dplyr::group_by(name) %>%
mutate(across(c(starts_with("score")), .fns = list(true = ~coalesce(.))))
CodePudding user response:
You can use fill()
, and then arrange()
the scores and use slice_head()
:
test_dataset %>%
group_by(name) %>%
fill(score1, score2) %>%
arrange(score1, score2) %>%
slice_head(n=1)
Output:
name score1_true score2_true
<chr> <dbl> <dbl>
1 justin 1 7
2 corey 2 5
3 sib 2 9
4 kate NA NA
more concise/improved version thanks to @M.Viking:
- use the
.direction="up"
option withinfill()
test_dataset %>%
group_by(name) %>%
fill(score1, score2, .direction="up") %>%
slice_head(n=1)
CodePudding user response:
We could reorder the values based on the NA
elements and then slice the first row
library(dplyr)
test_dataset %>%
group_by(name) %>%
dplyr::mutate(across(starts_with('score'),
~ .x[order(is.na(.x))])) %>%
slice_head(n = 1) %>%
ungroup
-output
# A tibble: 4 × 3
name score1 score2
<chr> <dbl> <dbl>
1 corey 2 5
2 justin 1 7
3 kate NA NA
4 sib 2 9
Or another option is to use complete.cases
after rearranging
test_dataset %>%
group_by(name) %>%
dplyr::mutate(across(starts_with('score'),
~ .x[order(is.na(.x))])) %>%
filter(complete.cases(across(starts_with('score')))|row_number() == 1) %>%
ungroup
-output
# A tibble: 4 × 3
name score1 score2
<chr> <dbl> <dbl>
1 justin 1 7
2 corey 2 5
3 sib 2 9
4 kate NA NA