for loop outputting too many rows of the same data/compiling a data frame-CodePudding

I am attempting to compile some data in R studio from a loop into a data frame. Currently my code is correlating two variables for 122 participants across 120 trials.

for (i in unique(adult_pref_1$subject)){
 a <- cor.test(adult_pref_1$own_pref[adult_pref_1$subject == i], adult_pref_1$profile_rating_new[adult_pref_1$subject == i]) 
 print(paste(colnames(adult_pref_1)[adult_pref_1$subject], " est:", a$estimate, "p=value:", a$p.value))
    }

When I perform the loop, the correct estimates and p-values are generated; however, it's printing the value for about 1000 lines before it starts printing the next subject's correlation estimates and p-values. I am not sure why this is happening and; ideally, would like to aggregate the data where one row comprises a single subjects ID, estimate, and p-value (summing to 122 rows). Additionally, how can I compile this data into a data frame. Thank you for your suggestions.

Here is some of the original data

structure(list(sub = c("59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956", 
"59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956", "59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956", 
"59917f16e339120001fb8c21_fvHlk:5fbd11ca7025930168297956", "59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956", 
"59917f16e339120001fb8c21_uK9Bt:5fbd11ca7025930168297956"), subject = c("59917f16e339120001fb8c21", 
"59917f16e339120001fb8c21", "59917f16e339120001fb8c21", "59917f16e339120001fb8c21", 
"59917f16e339120001fb8c21", "59917f16e339120001fb8c21"), event = c(94L, 
46L, 96L, 80L, 21L, 52L), timestamp = c("24-Nov-2020 14:30:25", 
"24-Nov-2020 14:10:03", "24-Nov-2020 14:30:38", "24-Nov-2020 14:28:38", 
"24-Nov-2020 14:07:02", "24-Nov-2020 14:10:44"), profile = c("mean", 
"odd", "mean", "mean", "odd", "odd"), rating = c(4, 4, 4, 4, 
3, 3), rt_ms = c(2006, 1333, 1275, 1504, 1911, 1410), image = c("beads_1.png", 
"beads_2.png", "notebook_1.png", "notebook_2.png", "notebook_3.png", 
"notebook_4.png"), trial = c(33L, 45L, 35L, 19L, 20L, 51L), onset_s = c(738.738, 
345.591, 752.789, 631.909, 164.527, 386.536), profile_rating = c(2L, 
5L, 9L, 10L, 3L, 5L), block = c(2L, 1L, 2L, 2L, 1L, 1L), sub_num = c(179L, 
154L, 179L, 179L, 154L, 154L), session = c(1L, 2L, 1L, 1L, 2L, 
2L), own_pref = c(4, 4, 4, 4, 3, 4), cat1 = c(1L, 1L, 1L, 1L, 
1L, 1L), cat2 = c(1L, 1L, 1L, 1L, 1L, 1L), item_num = c(16L, 
17L, 85L, 86L, 87L, 88L), own_pref_nan = c(4, 4, 4, 4, 3, 4), 
    profile_rating_new = c(2L, 3L, 5L, 6L, 2L, 3L), PE = c(2, 
    1, 1, 2, 1, 0), PE_si = c(2, 1, -1, -2, 1, 0), se_PE = c(0, 
    0, 0, 0, 0, 1), pro_PE = c(2, 1, 1, 2, 1, 1)), row.names = c(NA, 
6L), class = "data.frame")

CodePudding user response：

There's no need to use an explicit for loop to do this. In fact, I believe a good rule of thumb for R programming is "If you're thinking of using a for loop, there's probably a better way to do it"...

Here's a solution using group_by() and group_map() from the tidyverse and tidy() from broom. group_by groups a data frame and applies the rest of the pipe to the groups it creates. (Note that it doesn't sort the data frame.) group_map applies the function defined by its argument to the groups of the data.frame. It returns a list of data frames. tidy is a generic that converts the output of many statistical functions to data frames in a reasonably consistent manner.

One of the functions of bind_rows() is to convert a list of data frames to a single data frame.

library(broom)
library(tidyverse)

df %>% 
  group_by(subject) %>% 
  group_map(
    function(.x, .y) {
      tidy(cor.test(.x$own_pref, .x$profile_rating_new))
    },
    .keep=TRUE
  ) %>% 
  bind_rows()
# A tibble: 1 × 8
  estimate statistic p.value parameter conf.low conf.high method                               alternative
     <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>                                <chr>      
1    0.447         1   0.374         4   -0.572     0.924 Pearson's product-moment correlation two.sided

CodePudding user response：

Avoid using for-loops in R whenever possible.

Combining group_by() with mutate() you can run a correlation test for each subject and add the estimate and p-value as new columns.

library(dplyr)

adult_pref_1 |>
  # Perform the next task separately for each subject
  group_by(subject) |>

  # Run the tests, add results to new columns 'estimate', 'pvalue'
  mutate(estimate = cor.test(own_pref, profile_rating_new)$estimate,
         pvalue = cor.test(own_pref, profile_rating_new)$p.value) |> 

  # Remove irrelevant columns
  select(subject, estimate, pvalue) |> 

  # Remove duplicate rows
  distinct(subject, .keep_all = TRUE)

Output:

#> # A tibble: 1 × 3
#> # Groups:   subject [1]
#>   subject                  estimate pvalue
#>   <chr>                       <dbl>  <dbl>
#> 1 59917f16e339120001fb8c21    0.447  0.374

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

CodePudding user response：

While Limey and Andrea M's answers are MUCH better, if you are dead set on continuing with a for loop (and hopefully for the purposes of better understanding), this would work. As previously stated, this is inefficient and non-ideal code.

First we initialize a data frame with the columns and length that we want:


results.df <- data.frame("ID" = character(length(unique(adult_pref_1$subject))), 
                         "estimate" = numeric(length(unique(adult_pref_1$subject))), 
                         "pvalue" = numeric(length(unique(adult_pref_1$subject))))

Then we use a for loop to fill it.


   for (i in 1: length(unique(adult_pref_1$subject))){
     this.subject <- unique(adult_pref_1$subject[i])
     a <- cor.test(adult_pref_1$own_pref[adult_pref_1$subject == this.subject], 
                   adult_pref_1$profile_rating_new[adult_pref_1$subject == this.subject]) 
     results.df[i,] <- data.frame(this.subject, a$estimate, a$p.value)
     print(results.df[i,])
   }

Your issue was here: paste(colnames(adult_pref_1)[adult_pref_1$subject]

I can't figure out how this is supposed to work but you can see how I did it above.