Home > OS >  Loop through data frame and get counts. Output to other data frame
Loop through data frame and get counts. Output to other data frame

Time:09-07

I have an 30 * 9 data frame filled with integers 1-9. Each integer can feature multiple times in a column, or none at all.

I basically wanted to calculate the number of times a number appears, in order to generate a column of 9 rows (of counts) for each element of the original data frame, to end up with a 9 * 9 data frame of counts. I also wanted to have a 0 placed where a number does not appear in a particular column.

So far I tried multiple approaches with for loops, tapply, functions etc. But I cannot seem to end up with a result which can be stored directly into a new data frame in a loop.

for (i in seq_along(columnHeaderQuosureList)) {
         original_data_frame %>% 
           group_by(!! columnHeaderQuosureList[[i]]) %>% # Unquote with !!
           count(!! columnHeaderQuosureList[[i]]) %>% 
           print()
       } 

This works and prints each count for each column. I tried replacing print() with return() and then trying to cbind the returned output with the result_data_frame. Unfortunately I am getting nowhere, and I do not think my approach is feasible.

Does anyone have any better ideas please?

CodePudding user response:

The function to count instances of unique values in R is table.

# simulated data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

# use stack to turn column name into a factor column (long format)
table(stack(df))

      ind
values V1 V2 V3 V4 V5 V6 V7 V8 V9
     1  4  0  1  2  6  3  6  3  2
     2  2  2  5  4  4  4  3  3  2
     3  4  4  3  7  2  7  1  4  2
     4  1  5  1  3  3  5  6  3  5
     5  3  4  4  4  4  2  2  3  6
     6  7  5  3  2  3  0  1  3  3
     7  4  5  3  3  2  1  3  3  1
     8  3  2  3  3  5  2  6  4  3
     9  2  3  7  2  1  6  2  4  6

Edit: forgot the tricky bit. The output of table is a table object, which looks like a matrix but gets turned into a long format if you try to do as.data.frame. To turn your result into a 9x9 df, use

as.data.frame.matrix(table(stack(df)))

Caveat: if for some reason one of the 9 digits doesn't appear anywhere in the original df, then that row will be skipped (instead of being filled with 0s).

CodePudding user response:

This is very similar to the behaviour of tabulate(), so you can do:

#Create example data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

#Counts of digits 1-9
as.data.frame(sapply(df, tabulate, nbins=9))

CodePudding user response:

Here's a tidyverse solution that is robust with respect to the numbert of columns and the number of distinct values they contain. I avoid recourse to loops andnon-standard evaluation by making the data tidy by pivoting.

First, create some test data

library(tidyverse)

# For reproducibility
set.seed(123)

d <- tibble(
  c1=floor(runif(30, 1, 10)),
  c2=floor(runif(30, 1, 10)),
  c3=floor(runif(30, 1, 10)),
  c4=floor(runif(30, 1, 10)),
  c5=floor(runif(30, 1, 10)),
  c6=floor(runif(30, 1, 10)),
  c7=floor(runif(30, 1, 10)),
  c8=floor(runif(30, 1, 10)),
  c9=floor(runif(30, 1, 10))
)
d
# A tibble: 30 × 9
      c1    c2    c3    c4    c5    c6    c7    c8    c9
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     3     9     6     2     6     8     8     5     5
 2     8     9     1     6     3     5     3     3     6
 3     4     7     4     4     3     4     7     2     2
 4     8     8     3     6     2     3     3     7     6
 5     9     1     8     3     4     1     6     1     3
 6     1     5     5     2     9     4     5     7     7
 7     5     7     8     8     2     6     3     4     4
 8     9     2     8     1     1     2     6     4     9
 9     5     3     8     5     2     5     9     8     9
10     5     3     4     5     7     2     9     9     7
# … with 20 more rows

Now solve the problem

d %>% 
  pivot_longer(
    everything(),
    names_to="Column", 
    values_to="Value"
  ) %>% 
  group_by(Column, Value) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Column,
    values_from=N,
    id_cols=Value,
    values_fill=0
  )
# A tibble: 9 × 10
  Value    c1    c2    c3    c4    c5    c6    c7    c8    c9
  <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     2     3     2     3     1     0     3     1
2     2     1     7     3     4     4     4     3     3     3
3     3     4     4     2     3     6     2     7     3     3
4     4     1     5     6     3     3     7     3     5     4
5     5     5     2     2     5     1     5     3     3     5
6     6     4     1     3     5     3     5     6     2     4
7     7     3     3     3     1     4     3     1     5     3
8     8     2     3     6     1     3     3     2     3     3
9     9     7     3     2     6     3     0     5     3     4
  • Related