I have a load of survey data that I need to run a t-test through. It looks something like this (but not much like this, a dolphin is unlikely to be 52mm):
Area Season Species Length (mm)
Christchurch Spring dolphin 52
Christchurch Spring dolphin 54
Christchurch Spring dolphin 46
Christchurch Spring dolphin 40
Christchurch Spring dolphin 38
Christchurch Autumn dolphin 52
Christchurch Autumn dolphin 54
Christchurch Autumn dolphin 46
Christchurch Autumn dolphin 40
Christchurch Autumn dolphin 38
Christchurch Spring ray 52
Christchurch Spring ray 54
Christchurch Spring ray 46
Christchurch Spring ray 40
Christchurch Spring ray 38
Christchurch Autumn ray 52
Christchurch Autumn ray 54
Christchurch Autumn ray 46
Christchurch Autumn ray 40
Christchurch Autumn ray 38
My problem is I have a range of species and about 2000 measurements and I need to run a paired t-test for each species between each season. I am very new to r and coding in general so any help is appreciated in making this process more efficient as I am fully aware I have probably not gone about this the most streamlined way.
I'd like to be able to loop the t-test through somehow and get a nice understandable output and be able to apply the script to other locations easily (I have 6).
I have split the large data frame down to species and removed the empty data frames from the list
list_df<-split(ld22,ld22$SPECIES_NAME)
list_df<-list_df[sapply(list_df, nrow) > 0]
I then tried this, which I found by googling the problem:
p <-list()
for (i in 1:length(list_df)) {
p[[i]] <- pairwise.t.test(list_df[[i]]$TOTAL_LENGTH_MM, list_df[[i]]$SURVEY_TYPE, p.adjust = "none")
}
p
There are no error codes but I don't get any results and I have no idea where to go next. Any help would be much appreciated.
CodePudding user response:
We could use lapply
instead of the loop to make it a bit less verbose. We would probably want want to extract the p.value
from the returned list too. I.e.
p <-
split(ld22, ld22$Species) |>
lapply(\(x) pairwise.t.test(x$Length, x$Season, p.adjust = "none")$p.value)
Output:
$dolphin
Autumn
Spring 1
$ray
Autumn
Spring 1
Data:
library("readr")
ld22 <- read_table("Area Season Species Length
Christchurch Spring dolphin 52
Christchurch Spring dolphin 54
Christchurch Spring dolphin 46
Christchurch Spring dolphin 40
Christchurch Spring dolphin 38
Christchurch Autumn dolphin 52
Christchurch Autumn dolphin 54
Christchurch Autumn dolphin 46
Christchurch Autumn dolphin 40
Christchurch Autumn dolphin 38
Christchurch Spring ray 52
Christchurch Spring ray 54
Christchurch Spring ray 46
Christchurch Spring ray 40
Christchurch Spring ray 38
Christchurch Autumn ray 52
Christchurch Autumn ray 54
Christchurch Autumn ray 46
Christchurch Autumn ray 40
Christchurch Autumn ray 38")
Update:
Or just use dplyr
:
library(dplyr)
ld22 |>
group_by(Species) |>
summarise(p_value = pairwise.t.test(Length, Season, p.adjust = "none")$p.value) |>
ungroup()
Output:
# A tibble: 2 × 2
Species p_value[,1]
<chr> <dbl>
1 dolphin 1
2 ray 1
CodePudding user response:
Everything in one go using purrr
:
library(purrr)
library(dplyr)
ld22 |>
group_split(Species) |>
setNames(unique(ld22 $Species)) |>
keep(~length(.x) > 0) |>
imap(~pairwise.t.test(x = .x$Length, g = .x$Season,p.adjust = "none") |>
broom::tidy() |>
mutate(species = .y))
Output:
$dolphin
# A tibble: 1 x 4
group1 group2 p.value species
<chr> <chr> <dbl> <chr>
1 Spring Autumn 1 dolphin
$ray
# A tibble: 1 x 4
group1 group2 p.value species
<chr> <chr> <dbl> <chr>
1 Spring Autumn 1 ray
CodePudding user response:
Write a function and use map function. Can u dput(list_df) if this doesn't work?
library(magrittr)
library(tidyverse)
my_function<-function(df){
df %$% pairwise.t.test(TOTAL_LENGTH_MM, SURVEY_TYPE, p.adjust = "none")
}
map(list_df,my_function)