Reading CSV files into Dask DataFrames using usecols-CodePudding

I am reading CSV file in dask but while reading, i want to "usecols" as we use in panads.

currently using for DASK, df = dd.read_csv('myfiles.csv') #in dask

I want to use like this as we can do in pandas, df = pd.read_csv('myfiles.csv',usecols=["date", "loc", "x"]

CodePudding user response：

Have you tried:

df = dd.read_csv('myfiles.csv',names=["date", "loc", "x"])

Here is a definition from pandas.read_csv

names array-like, optional

List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

You can use Extra keyword arguments to forward to pandas.read_csv(). dask.dataframe.read_csv so even

df = dd.read_csv('myfiles.csv',usecols=["date", "loc", "x"])

will work for you.