I am attempting to compile some data in R studio from a loop into a data frame. Currently my code is correlating two variables for 122 participants across 120 trials.
for (i in unique(adult_pref_1$subject)){
a <- cor.test(adult_pref_1$own_pref[adult_pref_1$subject == i], adult_pref_1$profile_rating_new[adult_pref_1$subject == i])
print(paste(colnames(adult_pref_1)[adult_pref_1$subject], " est:", a$estimate, "p=value:", a$p.value))
}
When I perform the loop, the correct estimates and p-values are generated; however, it's printing the value for about 1000 lines before it starts printing the next subject's correlation estimates and p-values. I am not sure why this is happening and; ideally, would like to aggregate the data where one row comprises a single subjects ID, estimate, and p-value (summing to 122 rows). Additionally, how can I compile this data into a data frame. Thank you for your suggestions.
Here is some of the original data
structure(list(sub = c("59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956",
"59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956", "59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956",
"59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956", "59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956",
"59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956"), subject = c("59917f16e339120001fb8c21",
"59917f16e339120001fb8c21", "59917f16e339120001fb8c21", "59917f16e339120001fb8c21",
"59917f16e339120001fb8c21", "59917f16e339120001fb8c21"), event = c(94L,
46L, 96L, 80L, 21L, 52L), timestamp = c("24-Nov-2020 14:30:25",
"24-Nov-2020 14:10:03", "24-Nov-2020 14:30:38", "24-Nov-2020 14:28:38",
"24-Nov-2020 14:07:02", "24-Nov-2020 14:10:44"), profile = c("mean",
"odd", "mean", "mean", "odd", "odd"), rating = c(4, 4, 4, 4,
3, 3), rt_ms = c(2006, 1333, 1275, 1504, 1911, 1410), image = c("beads_1.png",
"beads_2.png", "notebook_1.png", "notebook_2.png", "notebook_3.png",
"notebook_4.png"), trial = c(33L, 45L, 35L, 19L, 20L, 51L), onset_s = c(738.738,
345.591, 752.789, 631.909, 164.527, 386.536), profile_rating = c(2L,
5L, 9L, 10L, 3L, 5L), block = c(2L, 1L, 2L, 2L, 1L, 1L), sub_num = c(179L,
154L, 179L, 179L, 154L, 154L), session = c(1L, 2L, 1L, 1L, 2L,
2L), own_pref = c(4, 4, 4, 4, 3, 4), cat1 = c(1L, 1L, 1L, 1L,
1L, 1L), cat2 = c(1L, 1L, 1L, 1L, 1L, 1L), item_num = c(16L,
17L, 85L, 86L, 87L, 88L), own_pref_nan = c(4, 4, 4, 4, 3, 4),
profile_rating_new = c(2L, 3L, 5L, 6L, 2L, 3L), PE = c(2,
1, 1, 2, 1, 0), PE_si = c(2, 1, -1, -2, 1, 0), se_PE = c(0,
0, 0, 0, 0, 1), pro_PE = c(2, 1, 1, 2, 1, 1)), row.names = c(NA,
6L), class = "data.frame")
CodePudding user response:
There's no need to use an explicit for
loop to do this. In fact, I believe a good rule of thumb for R programming is "If you're thinking of using a for loop, there's probably a better way to do it"...
Here's a solution using group_by()
and group_map()
from the tidyverse and tidy()
from broom. group_by
groups a data frame and applies the rest of the pipe to the groups it creates. (Note that it doesn't sort the data frame.) group_map
applies the function defined by its argument to the groups of the data.frame. It returns a list of data frames. tidy
is a generic that converts the output of many statistical functions to data frames in a reasonably consistent manner.
One of the functions of bind_rows()
is to convert a list of data frames to a single data frame.
library(broom)
library(tidyverse)
df %>%
group_by(subject) %>%
group_map(
function(.x, .y) {
tidy(cor.test(.x$own_pref, .x$profile_rating_new))
},
.keep=TRUE
) %>%
bind_rows()
# A tibble: 1 × 8
estimate statistic p.value parameter conf.low conf.high method alternative
<dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
1 0.447 1 0.374 4 -0.572 0.924 Pearson's product-moment correlation two.sided
CodePudding user response:
Avoid using for-loops in R whenever possible.
Combining group_by()
with mutate()
you can run a correlation test for each subject
and add the estimate and p-value as new columns.
library(dplyr)
adult_pref_1 |>
# Perform the next task separately for each subject
group_by(subject) |>
# Run the tests, add results to new columns 'estimate', 'pvalue'
mutate(estimate = cor.test(own_pref, profile_rating_new)$estimate,
pvalue = cor.test(own_pref, profile_rating_new)$p.value) |>
# Remove irrelevant columns
select(subject, estimate, pvalue) |>
# Remove duplicate rows
distinct(subject, .keep_all = TRUE)
Output:
#> # A tibble: 1 × 3
#> # Groups: subject [1]
#> subject estimate pvalue
#> <chr> <dbl> <dbl>
#> 1 59917f16e339120001fb8c21 0.447 0.374
Created on 2022-06-14 by the reprex package (v2.0.1)
CodePudding user response:
While Limey and Andrea M's answers are MUCH better, if you are dead set on continuing with a for loop (and hopefully for the purposes of better understanding), this would work. As previously stated, this is inefficient and non-ideal code.
First we initialize a data frame with the columns and length that we want:
results.df <- data.frame("ID" = character(length(unique(adult_pref_1$subject))),
"estimate" = numeric(length(unique(adult_pref_1$subject))),
"pvalue" = numeric(length(unique(adult_pref_1$subject))))
Then we use a for loop to fill it.
for (i in 1: length(unique(adult_pref_1$subject))){
this.subject <- unique(adult_pref_1$subject[i])
a <- cor.test(adult_pref_1$own_pref[adult_pref_1$subject == this.subject],
adult_pref_1$profile_rating_new[adult_pref_1$subject == this.subject])
results.df[i,] <- data.frame(this.subject, a$estimate, a$p.value)
print(results.df[i,])
}
Your issue was here: paste(colnames(adult_pref_1)[adult_pref_1$subject]
I can't figure out how this is supposed to work but you can see how I did it above.