I have just started to use R, and this maybe a very basic question --- I am trying to calculate the average value of multiple variables. My variables are people's trust in different things measured on the scale of 1 to 5.
- I started with:
intp.trust <- EU_value_study %>%
summarise(average_intp.trust = mean(v32:v37))
and received a warning:
Warning messages:
1: In v32:v37 :
numerical expression has 56368 elements: only the first used
2: In v32:v37 :
numerical expression has 56368 elements: only the first used
I did get the result, but I think this result maybe wrong because of the previous warning?
> intp.trust
# A tibble: 1 × 1
average_intp.trust
<dbl>
1 1
- I then tried:
intp.trust <- EU_value_study %>%
rowwise()%>%
summarise(average_intp.trust = mean(v32:v37))
received error:
Error: Problem with `summarise()` column `average_intp.trust`.
ℹ `average_intp.trust = mean(v32:v37)`.
x NA/NaN argument
ℹ The error occurred in row 8.
Backtrace:
- I have also tried:
intp.trust <- EU_value_study %>%
summarise(average_intp.trust = rowwise_mean(v32:v37))
also received error:
Error: Problem with `summarise()` column `average_intp.trust`.
ℹ `average_intp.trust = rowwise_mean(v32:v37)`.
x could not find function "rowwise_mean"
Backtrace:
1. EU_value_study %>% summarise(average_intp.trust = rowwise_mean(v32:v37))
7. base::.handleSimpleError(...)
8. dplyr:::h(simpleError(msg, call))
could someone help me with the error? Shall I use mutate() instead of summarise()? Many thanks :)
CodePudding user response:
We may need to use c_across
in rowwise
library(haven)
EU_value_study %>%
zap_labels() %>%
rowwise()%>%
transmute(average_intp.trust = mean(c_across(v32:v37),
na.rm = TRUE), .groups = 'drop')
Also, instead of rowwise
with mean
which should be slow, use the vectorized rowMeans
EU_value_study %>%
zap_labels() %>%
transmute(average_intp.trust = rowMeans(across(v32:v37), na.rm = TRUE))
NOTE: The summarise
would work but the rowwise mean
is not really a summarisation i.e. it will return the same number of rows as in the original data. So, technically, it is a mutate/transmute
(transmute
- if we need only that column as output)
CodePudding user response:
I assume your data looks like this
library(tidyverse)
n=100
df = tibble(
id = rep(1:100, 50),
var = rep(paste0("v",1:50), each=n),
val = sample(1:5, 5000, replace = TRUE)
) %>% pivot_wider(id, names_from = var, values_from = val)
output
# A tibble: 100 x 51
id v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 5 3 3 2 4 1 5 2 5 2 1 3 3 5 4 4 5 3 5 1 4
2 2 5 4 1 3 3 4 3 4 3 3 2 4 5 4 2 5 4 3 4 2 1
3 3 5 1 3 1 3 3 4 2 5 2 5 1 5 1 4 4 3 3 5 3 1
4 4 3 1 1 1 4 5 2 1 2 4 5 3 1 4 1 5 5 1 1 1 4
5 5 1 4 1 4 4 1 2 4 5 4 1 2 4 4 5 5 5 3 4 3 2
6 6 2 5 5 2 1 2 4 3 4 4 5 3 3 4 2 4 1 2 1 5 5
7 7 5 2 1 2 4 5 5 2 1 5 3 2 1 4 2 3 1 1 4 2 2
8 8 3 3 1 3 2 1 4 1 4 4 2 5 3 2 3 3 1 3 4 4 4
9 9 5 3 3 4 3 2 2 2 1 5 5 2 3 3 3 5 4 3 4 1 5
10 10 5 2 5 2 1 1 5 4 4 4 2 4 1 2 1 3 5 4 5 5 5
# ... with 90 more rows, and 29 more variables: v22 <int>, v23 <int>, v24 <int>, v25 <int>, v26 <int>, v27 <int>, v28 <int>, v29 <int>,
# v30 <int>, v31 <int>, v32 <int>, v33 <int>, v34 <int>, v35 <int>, v36 <int>, v37 <int>, v38 <int>, v39 <int>, v40 <int>, v41 <int>,
# v42 <int>, v43 <int>, v44 <int>, v45 <int>, v46 <int>, v47 <int>, v48 <int>, v49 <int>, v50 <int>
So we have 100 lines and 50 v
variables.
If you need the average of the variables v32: v37
do this
df %>% pivot_longer(v32:v37) %>%
summarise(
n = n(),
intp.trust = mean(value))
output
# A tibble: 1 x 2
n intp.trust
<int> <dbl>
1 600 3.06
The pivot_longer
function will turn your variables into two variables. See this example
df %>% pivot_longer(v1:v50)
output
# A tibble: 5,000 x 3
id name value
<int> <chr> <int>
1 1 v1 5
2 1 v2 3
3 1 v3 3
4 1 v4 2
5 1 v5 4
6 1 v6 1
7 1 v7 5
8 1 v8 2
9 1 v9 5
10 1 v10 2
# ... with 4,990 more rows
Now just use summarise
df %>% pivot_longer(v1:v50) %>%
summarise(
n = n(),
intp.trust = mean(value))
output
# A tibble: 1 x 2
n intp.trust
<int> <dbl>
1 5000 3.00