I have an 30 * 9 data frame filled with integers 1-9. Each integer can feature multiple times in a column, or none at all.
I basically wanted to calculate the number of times a number appears, in order to generate a column of 9 rows (of counts) for each element of the original data frame, to end up with a 9 * 9 data frame of counts. I also wanted to have a 0 placed where a number does not appear in a particular column.
So far I tried multiple approaches with for loops, tapply, functions etc. But I cannot seem to end up with a result which can be stored directly into a new data frame in a loop.
for (i in seq_along(columnHeaderQuosureList)) {
original_data_frame %>%
group_by(!! columnHeaderQuosureList[[i]]) %>% # Unquote with !!
count(!! columnHeaderQuosureList[[i]]) %>%
print()
}
This works and prints each count for each column. I tried replacing print() with return() and then trying to cbind the returned output with the result_data_frame. Unfortunately I am getting nowhere, and I do not think my approach is feasible.
Does anyone have any better ideas please?
CodePudding user response:
The function to count instances of unique values in R is table
.
# simulated data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))
# use stack to turn column name into a factor column (long format)
table(stack(df))
ind
values V1 V2 V3 V4 V5 V6 V7 V8 V9
1 4 0 1 2 6 3 6 3 2
2 2 2 5 4 4 4 3 3 2
3 4 4 3 7 2 7 1 4 2
4 1 5 1 3 3 5 6 3 5
5 3 4 4 4 4 2 2 3 6
6 7 5 3 2 3 0 1 3 3
7 4 5 3 3 2 1 3 3 1
8 3 2 3 3 5 2 6 4 3
9 2 3 7 2 1 6 2 4 6
Edit: forgot the tricky bit. The output of table
is a table object, which looks like a matrix but gets turned into a long format if you try to do as.data.frame
. To turn your result into a 9x9 df, use
as.data.frame.matrix(table(stack(df)))
Caveat: if for some reason one of the 9 digits doesn't appear anywhere in the original df, then that row will be skipped (instead of being filled with 0s).
CodePudding user response:
This is very similar to the behaviour of tabulate()
, so you can do:
#Create example data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))
#Counts of digits 1-9
as.data.frame(sapply(df, tabulate, nbins=9))
CodePudding user response:
Here's a tidyverse solution that is robust with respect to the numbert of columns and the number of distinct values they contain. I avoid recourse to loops andnon-standard evaluation by making the data tidy by pivoting.
First, create some test data
library(tidyverse)
# For reproducibility
set.seed(123)
d <- tibble(
c1=floor(runif(30, 1, 10)),
c2=floor(runif(30, 1, 10)),
c3=floor(runif(30, 1, 10)),
c4=floor(runif(30, 1, 10)),
c5=floor(runif(30, 1, 10)),
c6=floor(runif(30, 1, 10)),
c7=floor(runif(30, 1, 10)),
c8=floor(runif(30, 1, 10)),
c9=floor(runif(30, 1, 10))
)
d
# A tibble: 30 × 9
c1 c2 c3 c4 c5 c6 c7 c8 c9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 9 6 2 6 8 8 5 5
2 8 9 1 6 3 5 3 3 6
3 4 7 4 4 3 4 7 2 2
4 8 8 3 6 2 3 3 7 6
5 9 1 8 3 4 1 6 1 3
6 1 5 5 2 9 4 5 7 7
7 5 7 8 8 2 6 3 4 4
8 9 2 8 1 1 2 6 4 9
9 5 3 8 5 2 5 9 8 9
10 5 3 4 5 7 2 9 9 7
# … with 20 more rows
Now solve the problem
d %>%
pivot_longer(
everything(),
names_to="Column",
values_to="Value"
) %>%
group_by(Column, Value) %>%
summarise(N=n(), .groups="drop") %>%
pivot_wider(
names_from=Column,
values_from=N,
id_cols=Value,
values_fill=0
)
# A tibble: 9 × 10
Value c1 c2 c3 c4 c5 c6 c7 c8 c9
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 2 3 2 3 1 0 3 1
2 2 1 7 3 4 4 4 3 3 3
3 3 4 4 2 3 6 2 7 3 3
4 4 1 5 6 3 3 7 3 5 4
5 5 5 2 2 5 1 5 3 3 5
6 6 4 1 3 5 3 5 6 2 4
7 7 3 3 3 1 4 3 1 5 3
8 8 2 3 6 1 3 3 2 3 3
9 9 7 3 2 6 3 0 5 3 4