Using map() function to apply for each element-CodePudding

i have this example-dataset from a survey:

dt<- data.table(
  ID = c(1,2,3,4, 5, 6, 7, 8, 9, 10),
  education_code = c(20,50,20,60, 20, 10,5, 12, 12, 12),
  age = c(87,67,56,52, 34, 56, 67, 78, 23, 34),
  sex = c("F","M","M","M", "F","M","M","M", "M","M"),
  q1_1 = c(NA,1,5,3, 1, NA, 3, 4, 5,1),
  q1_2 = c(NA,1,5,3, 1, 2, NA, 4, 5,1),
  q1_3 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q1_text = c(NA,1,5,3, 1, 2, 3, 4, 5,1), 
  q2_1 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q2_2 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q2_3 = c(NA,1,5,3, 1, NA, 4, 5,1),
  q2_text = c(NA,1,5,3, 1, NA, 3, 4, 5,1),
  no_respond = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),

the dataset are much bigger and have much more questions. the questions in the survey were multiple choice with answer levels from 1 to 5.

I need to make some statistic analyzes on the data - and therefore I have made this new data table and included a "weight" variable as I need to weight my data. as you can see this cod only considers question 1 (q1_1).

dt[, .(ID, education_code, age, sex, item = q1_1)]
dt[, no_respond := is.na(item)]
dt[, weight := 1/(sum(no_respond==0)/.N), keyby = .(sex, education_code, age)]

I need, with the help of the map() function, apply the above for each element

How can I do so?

CodePudding user response：

As mentioned in the comments, you miss a dt <- in your dt[, .(ID, education_code, age, sex, item = q1_1)] which makes the column item unavailable in the following line dt[, no_respond := is.na(item)].

Your weighting scheme is not absolutely clear to me however, assuming you want to do what is done in your code here, I would go with dplyr solution to iterate over columns.

# your data without no_respond column and correcting missing value in q2_3
dt <- data.table::data.table(
  ID = c(1,2,3,4, 5, 6, 7, 8, 9, 10),
  education_code = c(20,50,20,60, 20, 10,5, 12, 12, 12),
  age = c(87,67,56,52, 34, 56, 67, 78, 23, 34),
  sex = c("F","M","M","M", "F","M","M","M", "M","M"),
  q1_1 = c(NA,1,5,3, 1, NA, 3, 4, 5,1),
  q1_2 = c(NA,1,5,3, 1, 2, NA, 4, 5,1),
  q1_3 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q1_text = c(NA,1,5,3, 1, 2, 3, 4, 5,1), 
  q2_1 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q2_2 = c(NA,1,5,3, 1, 2, 3, 4, 5,1),
  q2_3 = c(NA,1,5,3, 1, NA, NA, 4, 5,1),
  q2_text = c(NA,1,5,3, 1, NA, 3, 4, 5,1))


dt %>%
  group_by(sex, education_code, age) %>%    #groups the df by sex, education_code, age
  add_count() %>%                           #add a column with number of rows in each group
  mutate(across(starts_with("q"),           #for each column starting with "q"
                ~ 1/(sum(!is.na(.))/n),     #create a new column following your weight calculation
                .names = '{.col}_wgt')) %>% #naming the new column with suffix "_wgt" to original name
  ungroup()

CodePudding user response：

As dt is of class data.table, you can make a vector of columns of interest (i.e. your items; below I use grepl on the names), and then apply your weighting function to each of those columns using .SD and .SDcols, with by

qs = names(dt)[grepl("^q", names(dt))]

dt[, (paste0(qs,"wt")):=lapply(.SD, \(q) 1/(sum(!is.na(q))/.N)),
   .(sex, education_code, age), .SDcols = qs]