Home > Net >  Writing a huge R dataframe into 4 separate files in R
Writing a huge R dataframe into 4 separate files in R

Time:03-21

I have a tibble/dataframe in R with about 206 million records and 5 columns. My system runs out of memory if I do any further analysis/computation on this data. Hence, I want to write this tibble into 4 separate csv files (to disk) of ~50 million records each (last one would be ~56 million) and proceed with further computation/analysis in 4 separate iterations. I searched a few threads on the web could not find any suitable to this usecase.

How can I achieve this?

CodePudding user response:

Let us know if your machine has the memory for the below. This is to achieve OP's goal (request) to split then save original df into 4 separate files

library(data.table)
setDT(df)

# dummy data
df <- data.table(row_id = 1:123)

# parameters
x <- nrow(df)  # nrow of df
y <- 4    # no. of splits

# create batch number
df[, batch := rep(1:y, each=x/y, length.out=x)]

# split
df <- split(df, by='batch')

# save as separate csv
lapply( df, \(i) fwrite(i, file = paste0( i[1][1,'batch'], '.csv')) )

CodePudding user response:

Apologies if this solution misses the mark, but I believe the below should work:

df %>% #name of dataframe
slice(1:5.0e7) %>% #first 50M rows
write_csv("file_a.csv") #save as csv

and repeat for the remaining sets just changing the reference for slice() and the file name in write_csv()

  •  Tags:  
  • r
  • Related