I have over 20 SAS (sas7bdat) files all with same columns I want to read in Python. I need an iterative process to read all the files and rbind into one big df. This is what I have so far, but it throws an error saying no objects to concatenate.
import pyreadstat
import glob
import os
path = r'C:\\Users\myfolder' # or unix / linux / mac path
all_files = glob.glob(os.path.join(path , "/*.sas7bdat"))
li = []
for filename in all_files:
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sas7bdat, filename, chunksize= 10000, usecols=cols)
for df, meta in reader:
li.append(df)
frame = pd.concat(li, axis=0)
I found this answer to read in csv files helpful: Import multiple CSV files into pandas and concatenate into one DataFrame
CodePudding user response:
So if one has too big sas data files and plans to append all of them into one df then:
#chunksize command avoids the RAM from crashing...
for filename in all_files:
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sas7bdat, filename, chunksize= 10000, usecols=cols)
for df, meta in reader:
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)