calculating mean of multiple variables, Error: numerical expression has 56368 elements: only the fir-CodePudding

I have just started to use R, and this maybe a very basic question --- I am trying to calculate the average value of multiple variables. My variables are people's trust in different things measured on the scale of 1 to 5.

I started with:

intp.trust <- EU_value_study %>%
          summarise(average_intp.trust = mean(v32:v37))

and received a warning:

Warning messages:
1: In v32:v37 :
  numerical expression has 56368 elements: only the first used
2: In v32:v37 :
  numerical expression has 56368 elements: only the first used

I did get the result, but I think this result maybe wrong because of the previous warning?

> intp.trust
# A tibble: 1 × 1
  average_intp.trust
               <dbl>
1                  1

I then tried:

intp.trust <- EU_value_study %>%
  rowwise()%>%
  summarise(average_intp.trust = mean(v32:v37))

received error:

Error: Problem with `summarise()` column `average_intp.trust`.
ℹ `average_intp.trust = mean(v32:v37)`.
x NA/NaN argument
ℹ The error occurred in row 8.
Backtrace:

I have also tried:

intp.trust <- EU_value_study %>%
  summarise(average_intp.trust = rowwise_mean(v32:v37))

also received error:

Error: Problem with `summarise()` column `average_intp.trust`.
ℹ `average_intp.trust = rowwise_mean(v32:v37)`.
x could not find function "rowwise_mean"
Backtrace:
 1. EU_value_study %>% summarise(average_intp.trust = rowwise_mean(v32:v37))
 7. base::.handleSimpleError(...)
 8. dplyr:::h(simpleError(msg, call))

could someone help me with the error? Shall I use mutate() instead of summarise()? Many thanks :)

CodePudding user response：

We may need to use c_across in rowwise

library(haven)
EU_value_study %>%
  zap_labels() %>%
  rowwise()%>%
  transmute(average_intp.trust = mean(c_across(v32:v37), 
       na.rm = TRUE), .groups = 'drop')

Also, instead of rowwise with mean which should be slow, use the vectorized rowMeans

EU_value_study %>%
    zap_labels() %>%
    transmute(average_intp.trust = rowMeans(across(v32:v37), na.rm = TRUE))

NOTE: The summarise would work but the rowwise mean is not really a summarisation i.e. it will return the same number of rows as in the original data. So, technically, it is a mutate/transmute (transmute - if we need only that column as output)

CodePudding user response：

I assume your data looks like this

library(tidyverse)

n=100
df = tibble(
  id = rep(1:100, 50),
  var = rep(paste0("v",1:50), each=n),
  val = sample(1:5, 5000, replace = TRUE)
) %>% pivot_wider(id, names_from = var, values_from = val)

output

# A tibble: 100 x 51
      id    v1    v2    v3    v4    v5    v6    v7    v8    v9   v10   v11   v12   v13   v14   v15   v16   v17   v18   v19   v20   v21
   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1     1     5     3     3     2     4     1     5     2     5     2     1     3     3     5     4     4     5     3     5     1     4
 2     2     5     4     1     3     3     4     3     4     3     3     2     4     5     4     2     5     4     3     4     2     1
 3     3     5     1     3     1     3     3     4     2     5     2     5     1     5     1     4     4     3     3     5     3     1
 4     4     3     1     1     1     4     5     2     1     2     4     5     3     1     4     1     5     5     1     1     1     4
 5     5     1     4     1     4     4     1     2     4     5     4     1     2     4     4     5     5     5     3     4     3     2
 6     6     2     5     5     2     1     2     4     3     4     4     5     3     3     4     2     4     1     2     1     5     5
 7     7     5     2     1     2     4     5     5     2     1     5     3     2     1     4     2     3     1     1     4     2     2
 8     8     3     3     1     3     2     1     4     1     4     4     2     5     3     2     3     3     1     3     4     4     4
 9     9     5     3     3     4     3     2     2     2     1     5     5     2     3     3     3     5     4     3     4     1     5
10    10     5     2     5     2     1     1     5     4     4     4     2     4     1     2     1     3     5     4     5     5     5
# ... with 90 more rows, and 29 more variables: v22 <int>, v23 <int>, v24 <int>, v25 <int>, v26 <int>, v27 <int>, v28 <int>, v29 <int>,
#   v30 <int>, v31 <int>, v32 <int>, v33 <int>, v34 <int>, v35 <int>, v36 <int>, v37 <int>, v38 <int>, v39 <int>, v40 <int>, v41 <int>,
#   v42 <int>, v43 <int>, v44 <int>, v45 <int>, v46 <int>, v47 <int>, v48 <int>, v49 <int>, v50 <int>

So we have 100 lines and 50 v variables.

If you need the average of the variables v32: v37do this

df %>% pivot_longer(v32:v37) %>% 
  summarise(
    n = n(),
    intp.trust = mean(value))

output

# A tibble: 1 x 2
      n intp.trust
  <int>      <dbl>
1   600       3.06

The pivot_longer function will turn your variables into two variables. See this example

df %>% pivot_longer(v1:v50)

output

# A tibble: 5,000 x 3
      id name  value
   <int> <chr> <int>
 1     1 v1        5
 2     1 v2        3
 3     1 v3        3
 4     1 v4        2
 5     1 v5        4
 6     1 v6        1
 7     1 v7        5
 8     1 v8        2
 9     1 v9        5
10     1 v10       2
# ... with 4,990 more rows

Now just use summarise

df %>% pivot_longer(v1:v50) %>% 
  summarise(
    n = n(),
    intp.trust = mean(value))

output

# A tibble: 1 x 2
      n intp.trust
  <int>      <dbl>
1  5000       3.00