Home > Back-end >  How to run operations on tibbles and the columns/values in the tibbles, with the tibbles being in a
How to run operations on tibbles and the columns/values in the tibbles, with the tibbles being in a

Time:10-29

I am new to R and therefore sorry, if the awnser is obvious. I am trying to perform operations on tibbles and their values/columns while this tibbles are part of a list. Previously I would upload each of the now tibbles manually as a data.frame (csv data) and perform the operations manually on the data.frame. Unfortunately this is tiresome, so I am trying to get all the operations I have in my script done for all my data.frames at the same time. For example, what worked so far for me was to add 0.7 to every element in every column by the name 'Temperature' in each tibble on the list. I did it like that:

for(i in seq_along(Data_List)) {Data_List[[i]]$Temperature <- Data_List[[i]]$Temperature   0.7}

However I now would like to perform different tasks: primarily I need to divide my tibbles into sequences. When I worked with the one data.frame at a time, this is what I did:

df_Sitting <- df[1:12, ]
df_Standing <- df[13:26, ]
df_LigEx <- df[27:35, ]
df_VigEx <- df[36:42, ]
df_After <- df[43:54, ]

How do I adjust it properly for the list of all my tibbles/data.frames I now have? Secondly, I want to perform descriptive statistics, Pearson Correlation and Lin Correlation. Additionally I created a ggplot and a Bland-Altman-Plot. I did it like this:

describe(df$Temperature)
describe(df$Temp_core)
cor.test(df)
library(epiR)
epi.ccc(df$Temp_core, df$Temperature, ci = "z-transform", 
        conf.level = 0.95, rep.measure = FALSE, subjectid)
mdata <- melt(df, id="Time")
ggplot(data = mdata, aes(x = Time, y = value)) 
  geom_point(aes(group= variable, color = variable)) 
  geom_line(aes(group= variable, color = variable))
library(BlandAltmanLeh)
BlandAltman_df <- bland.altman.plot(df$Temp_core, df$Temperature, graph.sys = "ggplot2")
print(BlandAltman_df  theme(plot.title=element_text(hjust = 0.5)))

I want now to run all the functions above for the entire list of tibbles and variables within the tibbles at once and get all the corresponding Statistics and Plots, to later create a Markdown. I tried lapply but it somehow does not work. I hope I formulated the question correctly, I appreciate the help!!

CodePudding user response:

Working with a list of some other types is totally doable in R. Firstly, I suggest replacing seq_along with lapply, or since you are already using tidyverse, purrr::map:

for(i in seq_along(Data_List)) {
    Data_List[[i]]$Temperature <- Data_List[[i]]$Temperature   0.7
}

becomes:

modified_data_list <- purrr::map(Data_List, function(df){
    dplyr::mutate(df, Temperature = Temperature   0.7)
})

You can apply this same principle for your above function. Note that I use purrr:walk here instead of map, because you aren't returning a modified data frame in your function, you are instead calling it for "side effects" like the plot:

library(epiR)
library(BlandAltmanLeh)

modified_data_list <- purrr::walk(Data_List, function(df){
    describe(df$Temperature)
    describe(df$Temp_core)
    cor.test(df)
    epi.ccc(df$Temp_core, df$Temperature, ci = "z-transform", 
        conf.level = 0.95, rep.measure = FALSE, subjectid)
    mdata <- melt(df, id="Time")
    ggplot(data = mdata, aes(x = Time, y = value)) 
      geom_point(aes(group= variable, color = variable)) 
      geom_line(aes(group= variable, color = variable))
    BlandAltman_df <- bland.altman.plot(df$Temp_core, df$Temperature, graph.sys = "ggplot2")
    print(BlandAltman_df  theme(plot.title=element_text(hjust = 0.5)))
})

CodePudding user response:

You can lapply the tests and plot code to the list members and return lists of tests results and plots. Something like the following.

library(ggplot2)
library(epiR)
library(BlandAltmanLeh)

Data_List <- lapply(Data_List, \(X){
  X[["Temperature"]] <- X[["Temperature"]]   0.7
  X
})

cor_test_list <- lapply(Data_List, \(X) cor.test(formula = ~ Temperature   Temp_core, data = X))
lin_test_list <- lapply(Data_List, \(X){
  epi.ccc(
    X[["Temp_core"]], 
    X[["Temperature"]], 
    ci = "z-transform", 
    conf.level = 0.95, 
    rep.measure = FALSE
  )
})

gg_plot_list <- lapply(Data_List, \(X){
  mdata <- reshape2::melt(X, id = "Time")
  ggplot(data = mdata, aes(x = Time, y = value)) 
    geom_point(aes(group = variable, color = variable)) 
    geom_line(aes(group= variable, color = variable))
})

BlandAltman_List <- lapply(Data_List, \(X){
  BlandAltman_df <- bland.altman.plot(X$Temp_core, X$Temperature, graph.sys = "ggplot2")
  BlandAltman_df   
    theme(plot.title = element_text(hjust = 0.5))
})

The tests

To access the test results, use once again *apply loops together with extraction functions.

sapply(cor_test_list, "[[", "estimate")
# df_a.cor  df_b.cor  df_c.cor 
#0.7425467 0.5259107 0.4572278 

sapply(cor_test_list, "[[", "statistic")
#  df_a.t   df_b.t   df_c.t 
#7.680738 4.283887 3.561892 

sapply(cor_test_list, "[[", "p.value")
#        df_a         df_b         df_c 
#6.709843e-10 8.771860e-05 8.434625e-04 

sapply(lin_test_list, "[[", "rho.c")
sapply(lin_test_list, "[[", "sblalt")

The plots

The plots can be plotted one by one:

gg_plot_list[[1]]
BlandAltman_List[[1]]

or in a loop with print.

for(i in seq_along(gg_plot_list)) 
  print(gg_plot_list[[i]])

Or to a graphics device (to disk file).

for(i in seq_along(gg_plot_list)) {
  filename <- sprintf("Rplotd.png", i)
  png(filename = filename)
  print(gg_plot_list[[i]])
  dev.off()
}

Test data

Data_List <- iris[1:2]
names(Data_List) <- c("Temp_core", "Temperature")
Data_List$Time <- rep(1:50, 3)
Data_List <- split(Data_List, iris$Species)
names(Data_List) <- paste("df", letters[1:3], sep = "_")
Data_List <- lapply(Data_List, \(x){row.names(x) <- NULL; x})
  • Related