I have a large dataframe of connection strengths of between 116 nodes of a brain network for many individuals. The dataframe is stored in wide format, with 13456 columns of node connections (116*116) per subject. For example, the first few rows and columns of the dataframe look like below, where node1.2 is connection between nodes 1 and 2, and so on:
node1.2<-c(1, 4, 5, 2, 1, 2, 3, 6)
node1.3<-c(1, 4, 7, 6, 5, 3, 2, 1)
node1.4<-c(4, 3, 1, 5, 4, 3, 7, 6)
group<-c("a", "a", "a", "b", "b", "b", "c", "c")
subid<-c("001", "002", "003", "004", "005", "006", "007", "008")
df<-data.frame(subid, group, node1.2, node1.3, node1.4)
> df
subid group node1.2 node1.3 node1.4
1 001 a 1 1 4
2 002 a 4 4 3
3 003 a 5 7 1
4 004 b 2 6 5
5 005 b 1 5 4
6 006 b 2 3 3
7 007 c 3 2 7
8 008 c 6 1 6
I want to take each subject (i.e., each row), convert all their nodal-pair values into a matrix (i.e., 116 rows * 116 columns). I then want to save each matrix as an independent .csv file in my working directory.
How can I best do this in R?
CodePudding user response:
one approach:
library(dplyr)
## helper function: converts a dataframe to matrix of desired dimensions:
dataframe_to_matrix <- function(df){
matrix(unlist(df), 116, 116,
byrow = c(TRUE, FALSE) ## adapt as appropriate
)
}
## get list of rowwise matrices:
list_of_matrices <-
df %>%
rowwise %>%
mutate(matrices = list(dataframe_to_matrix(df = c_across(starts_with('node'))))) %>%
pull(matrices, subid) ## returns a named list, names taken from column "subid"
## pick list members by name and write them to disk:
names(list_of_matrices) %>%
sapply(function(n) {write.csv(list_of_matrices[[n]],
file = paste0("dataset_", n, ".csv")
)
})
CodePudding user response:
Another solution. I presume when you say the wide format of the df is 13456 columns you mean 13458 columns 116x116 subid group. If I'm wrong just change the number inside array(x[] ,...)
.
# Set your working directory
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
# Get every row as a list
data = split(df, seq(nrow(df)))
# iterate over
for(x in data){
name = x$subid
grp = x$group
write.csv(array(x[3:13458], dim = c(116,116)),file = paste0(name,'_',grp,".csv"))
}