I'm trying to use the following function to iterate through a dataframe and return the counts from each row:
library(dplyr)
library(tidyr)
row_freq <- function(df_input,row_input){
print(df_input)
vec <- unlist(df_input %>%
select(-1) %>%
slice(row_input), use.names = FALSE)
r <- data.frame(table(vec)) %>%
pivot_wider(values_from = Freq, names_from = vec)
return(r)
}
This works fine if I use a single row from the dataframe:
sample_df <- data.frame(id = c(1,2,3,4,5), obs1 = c("A","A","B","B","B"),
obs2 = c("B","B","C","D","D"), obs3 = c("A","B","A","D","A"))
row_freq(sample_df, 1)
id obs1 obs2 obs3
1 1 A B A
2 2 A B B
3 3 B C A
4 4 B D D
5 5 B D A
# A tibble: 1 × 2
A B
<int> <int>
1 2 1
But when iterating over rows using purrr::map_dfr
, it seems to reduce df_input
to only the id
column instead of using the entire dataframe as the argument, which I found quite strange:
purrr::map_dfr(sample_df, row_freq, 1:5)
[1] 1 2 3 4 5
Error in UseMethod("select") :
no applicable method for 'select' applied to an object of class "c('double', 'numeric')"
I'm looking for help with regards to 1) why this is happening, 2) how to fix it, and 3) any alternative approaches or functions that may already perform what I'm trying to do in a more efficient manner.
CodePudding user response:
Specify the order of the arguments correctly if we are not passing with named arguments
purrr::map_dfr(1:5, ~ row_freq(sample_df, .x))
-output
# A tibble: 5 × 4
A B C D
<int> <int> <int> <int>
1 2 1 NA NA
2 1 2 NA NA
3 1 1 1 NA
4 NA 1 NA 2
5 1 1 NA 1
Or use a named argument
purrr::map_dfr(df_input = sample_df, .f = row_freq, .x = 1:5)
-output
# A tibble: 5 × 4
A B C D
<int> <int> <int> <int>
1 2 1 NA NA
2 1 2 NA NA
3 1 1 1 NA
4 NA 1 NA 2
5 1 1 NA 1
The reason is that map
first argument is .x
map(.x, .f, ...)
and if we are providing the 'sample_df' as the argument, it takes the .x
as sample_df
and loops over the columns of the data (as data.frame/tibble/data.table - unit is column as these are list
with additional attributes)