Home > database >  How can I read and concatenate many csv files into one big dataframe in pandas?
How can I read and concatenate many csv files into one big dataframe in pandas?

Time:12-11

I have 100 csv files in one folder. I want to concatanate those csv files into a single dataframe.

I used the following code:

import os
import pandas as pd 

data_suntracker = [f for f in os.listdir(".") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, data_suntracker))

The output:

runfile('C:/Users/vasil/.spyder-py3/autosave/dokimastiko_sun4.py', wdir='C:/Users/vasil/.spyder-py3/autosave')
Traceback (most recent call last):

  File "C:\Program Files\Spyder\pkgs\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "c:\users\vasil\.spyder-py3\autosave\dokimastiko_sun4.py", line 5, in <module>
    df = pd.concat(map(pd.read_csv, data_suntracker))

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 368, in concat
    op = _Concatenator(

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 422, in __init__
    objs = list(objs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 611, in _read
    return parser.read(nrows)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 1778, in read
    ) = self._engine.read(  # type: ignore[attr-defined]

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
    chunks = self._reader.read_low_memory(nrows)

  File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory

  File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows

  File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows

  File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

and because I have the spyder application ...in the matrix in the up-right place the output is the following I have a list of the 100 csv files titles (that are strings) not databases. How can I fix my code in order to create the database that has all the data of these data files? All the files have the same columns.

CodePudding user response:

Try this code. It should work for you

import pandas as pd
import glob
import os

path = 'your path to files'
all_files = glob.glob(os.path.join(path , "/*.csv"))

temp_list = []

for filename in all_files:
    temp_df = pd.read_csv(filename, index_col=None, header=0)
    temp_list.append(temp_df)

single_df = pd.concat(temp_list, axis=0, ignore_index=True)

CodePudding user response:

Using pathlib

import pandas as pd
from pathlib import Path


path = "path/to/files/"
df = pd.concat((pd.read_csv(x) for x in Path(path).glob("*.csv")), ignore_index=True)
  • Related