Loop through data frame and get counts. Output to other data frame-CodePudding

I have an 30 * 9 data frame filled with integers 1-9. Each integer can feature multiple times in a column, or none at all.

I basically wanted to calculate the number of times a number appears, in order to generate a column of 9 rows (of counts) for each element of the original data frame, to end up with a 9 * 9 data frame of counts. I also wanted to have a 0 placed where a number does not appear in a particular column.

So far I tried multiple approaches with for loops, tapply, functions etc. But I cannot seem to end up with a result which can be stored directly into a new data frame in a loop.

for (i in seq_along(columnHeaderQuosureList)) {
         original_data_frame %>% 
           group_by(!! columnHeaderQuosureList[[i]]) %>% # Unquote with !!
           count(!! columnHeaderQuosureList[[i]]) %>% 
           print()
       }

This works and prints each count for each column. I tried replacing print() with return() and then trying to cbind the returned output with the result_data_frame. Unfortunately I am getting nowhere, and I do not think my approach is feasible.

Does anyone have any better ideas please?

CodePudding user response：

The function to count instances of unique values in R is table.

# simulated data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

# use stack to turn column name into a factor column (long format)
table(stack(df))

      ind
values V1 V2 V3 V4 V5 V6 V7 V8 V9
     1  4  0  1  2  6  3  6  3  2
     2  2  2  5  4  4  4  3  3  2
     3  4  4  3  7  2  7  1  4  2
     4  1  5  1  3  3  5  6  3  5
     5  3  4  4  4  4  2  2  3  6
     6  7  5  3  2  3  0  1  3  3
     7  4  5  3  3  2  1  3  3  1
     8  3  2  3  3  5  2  6  4  3
     9  2  3  7  2  1  6  2  4  6

Edit: forgot the tricky bit. The output of table is a table object, which looks like a matrix but gets turned into a long format if you try to do as.data.frame. To turn your result into a 9x9 df, use

as.data.frame.matrix(table(stack(df)))

Caveat: if for some reason one of the 9 digits doesn't appear anywhere in the original df, then that row will be skipped (instead of being filled with 0s).

CodePudding user response：

This is very similar to the behaviour of tabulate(), so you can do:

#Create example data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

#Counts of digits 1-9
as.data.frame(sapply(df, tabulate, nbins=9))

CodePudding user response：

Here's a tidyverse solution that is robust with respect to the numbert of columns and the number of distinct values they contain. I avoid recourse to loops andnon-standard evaluation by making the data tidy by pivoting.

First, create some test data

library(tidyverse)

# For reproducibility
set.seed(123)

d <- tibble(
  c1=floor(runif(30, 1, 10)),
  c2=floor(runif(30, 1, 10)),
  c3=floor(runif(30, 1, 10)),
  c4=floor(runif(30, 1, 10)),
  c5=floor(runif(30, 1, 10)),
  c6=floor(runif(30, 1, 10)),
  c7=floor(runif(30, 1, 10)),
  c8=floor(runif(30, 1, 10)),
  c9=floor(runif(30, 1, 10))
)
d
# A tibble: 30 × 9
      c1    c2    c3    c4    c5    c6    c7    c8    c9
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     3     9     6     2     6     8     8     5     5
 2     8     9     1     6     3     5     3     3     6
 3     4     7     4     4     3     4     7     2     2
 4     8     8     3     6     2     3     3     7     6
 5     9     1     8     3     4     1     6     1     3
 6     1     5     5     2     9     4     5     7     7
 7     5     7     8     8     2     6     3     4     4
 8     9     2     8     1     1     2     6     4     9
 9     5     3     8     5     2     5     9     8     9
10     5     3     4     5     7     2     9     9     7
# … with 20 more rows

Now solve the problem

d %>% 
  pivot_longer(
    everything(),
    names_to="Column", 
    values_to="Value"
  ) %>% 
  group_by(Column, Value) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Column,
    values_from=N,
    id_cols=Value,
    values_fill=0
  )
# A tibble: 9 × 10
  Value    c1    c2    c3    c4    c5    c6    c7    c8    c9
  <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     2     3     2     3     1     0     3     1
2     2     1     7     3     4     4     4     3     3     3
3     3     4     4     2     3     6     2     7     3     3
4     4     1     5     6     3     3     7     3     5     4
5     5     5     2     2     5     1     5     3     3     5
6     6     4     1     3     5     3     5     6     2     4
7     7     3     3     3     1     4     3     1     5     3
8     8     2     3     6     1     3     3     2     3     3
9     9     7     3     2     6     3     0     5     3     4