i have 2 csv files in my path. and i wanna concat or merge 2 files without reading. this is for data loading problem & merge error ( huge data )
my folder/a.csv my folder/b.csv
using code >> make a.csv b.csv / not using pd.read_csv
CodePudding user response:
Hey take a look at https://www.tensorflow.org/guide/data
You can work with https://www.tensorflow.org/guide/data#consuming_sets_of_files
Most of the operations in dataset don't load all the data as they are generators will load/preloaded data as it needs it.
You take and work with a subset of the datase with ds.take(number of items). It also allows to generate a pipeline including mapping functions. Once the pipeline is done you can iterate over the dataset to get all the data.
CodePudding user response:
One option to achieve this is using dask
:
from dask.dataframe import read_csv, concat
df1 = read_csv('my folder/a.csv')
df2 = read_csv('my folder/b.csv')
final_df = concat([df1, df2])
final_df.to_csv('combined_csv', index=False, single_file=True)