Home > Software engineering >  Loop a t-test through a list of data frames
Loop a t-test through a list of data frames

Time:12-14

I have a load of survey data that I need to run a t-test through. It looks something like this (but not much like this, a dolphin is unlikely to be 52mm):

Area                    Season  Species Length (mm)
Christchurch            Spring  dolphin 52
Christchurch            Spring  dolphin 54
Christchurch            Spring  dolphin 46
Christchurch            Spring  dolphin 40
Christchurch            Spring  dolphin 38
Christchurch            Autumn  dolphin 52
Christchurch            Autumn  dolphin 54
Christchurch            Autumn  dolphin 46
Christchurch            Autumn  dolphin 40
Christchurch            Autumn  dolphin 38
Christchurch            Spring  ray     52
Christchurch            Spring  ray     54
Christchurch            Spring  ray     46
Christchurch            Spring  ray     40
Christchurch            Spring  ray     38
Christchurch            Autumn  ray     52
Christchurch            Autumn  ray     54
Christchurch            Autumn  ray     46
Christchurch            Autumn  ray     40
Christchurch            Autumn  ray     38

My problem is I have a range of species and about 2000 measurements and I need to run a paired t-test for each species between each season. I am very new to r and coding in general so any help is appreciated in making this process more efficient as I am fully aware I have probably not gone about this the most streamlined way.

I'd like to be able to loop the t-test through somehow and get a nice understandable output and be able to apply the script to other locations easily (I have 6).

I have split the large data frame down to species and removed the empty data frames from the list

list_df<-split(ld22,ld22$SPECIES_NAME)
list_df<-list_df[sapply(list_df, nrow) > 0]

I then tried this, which I found by googling the problem:

p <-list()
for (i in 1:length(list_df)) {
  p[[i]] <- pairwise.t.test(list_df[[i]]$TOTAL_LENGTH_MM, list_df[[i]]$SURVEY_TYPE, p.adjust = "none")
}
p

There are no error codes but I don't get any results and I have no idea where to go next. Any help would be much appreciated.

CodePudding user response:

We could use lapply instead of the loop to make it a bit less verbose. We would probably want want to extract the p.value from the returned list too. I.e.

p <- 
  split(ld22, ld22$Species) |>
  lapply(\(x) pairwise.t.test(x$Length, x$Season, p.adjust = "none")$p.value)

Output:

$dolphin
       Autumn
Spring      1

$ray
       Autumn
Spring      1

Data:

library("readr")

ld22 <- read_table("Area                    Season  Species Length
Christchurch            Spring  dolphin 52
Christchurch            Spring  dolphin 54
Christchurch            Spring  dolphin 46
Christchurch            Spring  dolphin 40
Christchurch            Spring  dolphin 38
Christchurch            Autumn  dolphin 52
Christchurch            Autumn  dolphin 54
Christchurch            Autumn  dolphin 46
Christchurch            Autumn  dolphin 40
Christchurch            Autumn  dolphin 38
Christchurch            Spring  ray     52
Christchurch            Spring  ray     54
Christchurch            Spring  ray     46
Christchurch            Spring  ray     40
Christchurch            Spring  ray     38
Christchurch            Autumn  ray     52
Christchurch            Autumn  ray     54
Christchurch            Autumn  ray     46
Christchurch            Autumn  ray     40
Christchurch            Autumn  ray     38")

Update:

Or just use dplyr:

library(dplyr)

ld22 |>
  group_by(Species) |>
  summarise(p_value = pairwise.t.test(Length, Season, p.adjust = "none")$p.value) |>
  ungroup()

Output:

# A tibble: 2 × 2
  Species p_value[,1]
  <chr>         <dbl>
1 dolphin           1
2 ray               1

CodePudding user response:

Everything in one go using purrr:

library(purrr)
library(dplyr)
ld22  |> 
  group_split(Species) |> 
  setNames(unique(ld22 $Species)) |> 
  keep(~length(.x) > 0) |> 
  imap(~pairwise.t.test(x = .x$Length, g = .x$Season,p.adjust = "none") |> 
         broom::tidy() |> 
         mutate(species = .y))

Output:

$dolphin
# A tibble: 1 x 4
  group1 group2 p.value species
  <chr>  <chr>    <dbl> <chr>  
1 Spring Autumn       1 dolphin

$ray
# A tibble: 1 x 4
  group1 group2 p.value species
  <chr>  <chr>    <dbl> <chr>  
1 Spring Autumn       1 ray   

CodePudding user response:

Write a function and use map function. Can u dput(list_df) if this doesn't work?

library(magrittr)
library(tidyverse)
my_function<-function(df){
  df %$% pairwise.t.test(TOTAL_LENGTH_MM, SURVEY_TYPE, p.adjust = "none")
}
map(list_df,my_function)
  • Related