How can I read and concatenate many csv files into one big dataframe in pandas?-CodePudding

I have 100 csv files in one folder. I want to concatanate those csv files into a single dataframe.

I used the following code:

import os
import pandas as pd 

data_suntracker = [f for f in os.listdir(".") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, data_suntracker))

The output:

runfile('C:/Users/vasil/.spyder-py3/autosave/dokimastiko_sun4.py', wdir='C:/Users/vasil/.spyder-py3/autosave')
Traceback (most recent call last):

  File "C:\Program Files\Spyder\pkgs\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "c:\users\vasil\.spyder-py3\autosave\dokimastiko_sun4.py", line 5, in <module>
    df = pd.concat(map(pd.read_csv, data_suntracker))

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 368, in concat
    op = _Concatenator(

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 422, in __init__
    objs = list(objs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 611, in _read
    return parser.read(nrows)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 1778, in read
    ) = self._engine.read(  # type: ignore[attr-defined]

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
    chunks = self._reader.read_low_memory(nrows)

  File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory

  File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows

  File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows

  File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

and because I have the spyder application ...in the matrix in the up-right place the output is the following I have a list of the 100 csv files titles (that are strings) not databases. How can I fix my code in order to create the database that has all the data of these data files? All the files have the same columns.

CodePudding user response：

Try this code. It should work for you

import pandas as pd
import glob
import os

path = 'your path to files'
all_files = glob.glob(os.path.join(path , "/*.csv"))

temp_list = []

for filename in all_files:
    temp_df = pd.read_csv(filename, index_col=None, header=0)
    temp_list.append(temp_df)

single_df = pd.concat(temp_list, axis=0, ignore_index=True)

CodePudding user response：

Using pathlib

import pandas as pd
from pathlib import Path


path = "path/to/files/"
df = pd.concat((pd.read_csv(x) for x in Path(path).glob("*.csv")), ignore_index=True)