I have a dataframe that looks like this. I am showing the first 6 rows out of simplicity but the total number of rows is 8236. The grades range from 0-2. I have just shown grade 0 and 1 in the example below:
Telangiectasia_time grade
<chr> <int>
1 telangiectasia_tumour_0 0
2 telangiectasia_tumour_1 0
3 telangiectasia_tumour_12 0
4 telangiectasia_tumour_24 0
5 telangiectasia_tumour_0 1
6 telangiectasia_tumour_1 1
I want to group by Telangiectasia_Time(first column) and then count the number of grades for each group. Therefore, for the first 6 rows as an example, it should look like this:
Telangiectasia_time grade0 grade1 grade2
1 telangiectasia_tumour_0 1 1 0
2 telangiectasia_tumour_1 1 1 0
3 telangiectasia_tumour_12 1 0 0
4 telangiectasia_tumour_24 1 0 0
There are three columns at the end for the respective grades and a count of each grade for each variable. I tried to use the aggregate function:
**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)**
but I am not sure what to put in the last bit in the brackets so that the sum of each grade is returned. When I put sum, it simply adds the numbers and does not treat the variables as seperate (0,1 and 2). Using my full dataset I get the output which is wrong:
Telangiectasia_time grade
1 telangiectasia_tumour_0 18
2 telangiectasia_tumour_1 11
3 telangiectasia_tumour_12 38
4 telangiectasia_tumour_24 87
I have also tried group_by() but this just gives me a total count
telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
Telangiectasia_time count
* <chr> <int>
1 telangiectasia_tumour_0 2059
2 telangiectasia_tumour_1 2059
3 telangiectasia_tumour_12 2059
4 telangiectasia_tumour_24 2059
CodePudding user response:
With dpylr::count
and tidyr::pivot_wider
you could do:
library(dplyr)
library(tidyr)
telangiectasia_tumour_data %>%
count(Telangiectasia_time, grade) %>%
pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#> Telangiectasia_time grade0 grade1
#> <chr> <int> <int>
#> 1 telangiectasia_tumour_0 1 1
#> 2 telangiectasia_tumour_1 1 1
#> 3 telangiectasia_tumour_12 1 0
#> 4 telangiectasia_tumour_24 1 0
DATA
telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
"telangiectasia_tumour_0",
"telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
"telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
0L,
0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6"
))