Home > OS >  R: Convert each row of a large dataframe into a matrix and save output as a .csv file
R: Convert each row of a large dataframe into a matrix and save output as a .csv file

Time:05-10

I have a large dataframe of connection strengths of between 116 nodes of a brain network for many individuals. The dataframe is stored in wide format, with 13456 columns of node connections (116*116) per subject. For example, the first few rows and columns of the dataframe look like below, where node1.2 is connection between nodes 1 and 2, and so on:

node1.2<-c(1, 4, 5, 2, 1, 2, 3, 6)
node1.3<-c(1, 4, 7, 6, 5, 3, 2, 1)
node1.4<-c(4, 3, 1, 5, 4, 3, 7, 6)
group<-c("a", "a", "a", "b", "b", "b", "c", "c")
subid<-c("001", "002", "003", "004", "005", "006", "007", "008")

df<-data.frame(subid, group, node1.2, node1.3, node1.4)

> df
  subid group node1.2 node1.3 node1.4
1   001     a       1       1       4
2   002     a       4       4       3
3   003     a       5       7       1
4   004     b       2       6       5
5   005     b       1       5       4
6   006     b       2       3       3
7   007     c       3       2       7
8   008     c       6       1       6

I want to take each subject (i.e., each row), convert all their nodal-pair values into a matrix (i.e., 116 rows * 116 columns). I then want to save each matrix as an independent .csv file in my working directory.

How can I best do this in R?

CodePudding user response:

one approach:

library(dplyr)

## helper function: converts a dataframe to matrix of desired dimensions:
dataframe_to_matrix <- function(df){
   matrix(unlist(df), 116, 116,
          byrow = c(TRUE, FALSE) ## adapt as appropriate
          )
}

## get list of rowwise matrices:
list_of_matrices <- 
  df %>%
  rowwise %>%
  mutate(matrices = list(dataframe_to_matrix(df = c_across(starts_with('node'))))) %>%
  pull(matrices, subid) ## returns a named list, names taken from column "subid"

## pick list members by name and write them to disk:
names(list_of_matrices) %>%
sapply(function(n) {write.csv(list_of_matrices[[n]],
                             file = paste0("dataset_", n, ".csv")
                             )
})

CodePudding user response:

Another solution. I presume when you say the wide format of the df is 13456 columns you mean 13458 columns 116x116 subid group. If I'm wrong just change the number inside array(x[] ,...).

# Set your working directory
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
# Get every row as a list   
data = split(df, seq(nrow(df)))
# iterate over
for(x in data){
  name = x$subid
  grp = x$group
  write.csv(array(x[3:13458], dim = c(116,116)),file = paste0(name,'_',grp,".csv"))
}
  • Related