I have an R df where one column, assignment, looks like this:
course | instance | assignment |
---|---|---|
1 | 1 | A |
1 | 1 | B |
1 | 2 | B |
1 | 2 | C |
2 | 1 | A |
2 | 1 | C |
2 | 2 | B |
2 | 2 | A |
I need to create a superset (for lack of a better term) of all of the assignments in a course across instances.
For example: Course 1 was offered 2x, and in instance 1 it included assignments A and B, and in instance 2 it included assignments B and C. The superset of assignments in this class should include assignments A, B, and C each one time. In other words, every assignment that appears at least once across instances of a course should appear exactly one time in the superset.
UPDATE: I've tried the suggestion below.
library(tidyverse); df %>% group_by(course) %>%
summarise(all_assignments = toString(sort(unique(assignment))),
.groups = "drop")
This returns the following:
all_assignments | .groups |
---|---|
A | drop |
I've now tested this on the following sample data set:
df <- read.table(text = "course instance assignment
1 1 A
1 1 B
1 2 B
1 2 C
2 1 A
2 1 C
2 2 B
2 2 A", header = T)
Which returns a similar structure:
all_assignments | .groups |
---|---|
A, B, C | drop |
Apparently this exact code has worked for others, so I'm wondering what I'm doing incorrectly?
CodePudding user response:
I'm not entirely clear on your expected output (see my comment above); please have a look at the following
library(dplyr)
df %>%
group_by(course) %>%
summarise(
all_assignments = toString(sort(unique(assignment))),
.groups = "drop")
## A tibble: 2 × 2
# course all_assignments
# <int> <chr>
#1 1 A, B, C
#2 2 A, B, C
This is tested & verified on R_4.2.0
with dplyr_1.0.9
.
Sample data
df <- read.table(text = "course instance assignment
1 1 A
1 1 B
1 2 B
1 2 C
2 1 A
2 1 C
2 2 B
2 2 A", header = T)