Home > Net >  map_dfr converting data frame input to column
map_dfr converting data frame input to column

Time:05-17

I'm trying to use the following function to iterate through a dataframe and return the counts from each row:

library(dplyr)
library(tidyr)
row_freq <- function(df_input,row_input){
  print(df_input)
  vec <- unlist(df_input %>% 
                  select(-1) %>% 
                  slice(row_input), use.names = FALSE)
  r <- data.frame(table(vec)) %>% 
    pivot_wider(values_from = Freq, names_from = vec)
  return(r)
}

This works fine if I use a single row from the dataframe:

sample_df <- data.frame(id = c(1,2,3,4,5), obs1 = c("A","A","B","B","B"),
                        obs2 = c("B","B","C","D","D"), obs3 = c("A","B","A","D","A"))
row_freq(sample_df, 1)

  id obs1 obs2 obs3
1  1    A    B    A
2  2    A    B    B
3  3    B    C    A
4  4    B    D    D
5  5    B    D    A
# A tibble: 1 × 2
      A     B
  <int> <int>
1     2     1

But when iterating over rows using purrr::map_dfr, it seems to reduce df_input to only the id column instead of using the entire dataframe as the argument, which I found quite strange:

purrr::map_dfr(sample_df, row_freq, 1:5)
[1] 1 2 3 4 5
 Error in UseMethod("select") : 
no applicable method for 'select' applied to an object of class "c('double', 'numeric')"

I'm looking for help with regards to 1) why this is happening, 2) how to fix it, and 3) any alternative approaches or functions that may already perform what I'm trying to do in a more efficient manner.

CodePudding user response:

Specify the order of the arguments correctly if we are not passing with named arguments

purrr::map_dfr(1:5, ~ row_freq(sample_df, .x))

-output

# A tibble: 5 × 4
      A     B     C     D
  <int> <int> <int> <int>
1     2     1    NA    NA
2     1     2    NA    NA
3     1     1     1    NA
4    NA     1    NA     2
5     1     1    NA     1

Or use a named argument

purrr::map_dfr(df_input = sample_df, .f = row_freq, .x = 1:5)

-output

# A tibble: 5 × 4
      A     B     C     D
  <int> <int> <int> <int>
1     2     1    NA    NA
2     1     2    NA    NA
3     1     1     1    NA
4    NA     1    NA     2
5     1     1    NA     1

The reason is that map first argument is .x

map(.x, .f, ...)

and if we are providing the 'sample_df' as the argument, it takes the .x as sample_df and loops over the columns of the data (as data.frame/tibble/data.table - unit is column as these are list with additional attributes)

  • Related