I am new here and I need a help.
I got a trouble with OSError: [Errno 22] Invalid argument when I tried to use pd.read_csv with two csv files for dataset preprocess.
I created two dummy dataset as below:
test_1.csv: DATE,permno,datadate,gvkey, ....... (and a lot of features) 19260130,10006,19260130,3934, ........ 19260130,10022,19260130,3942, ........ 19260130,10030,19260130,3969, ........ 19260130,10049,19260130,3976, ........ 19260130,10057,19260130,3977, ........ 19260130,10065,19260130,3984, ........ 19260130,10073,19260130,3985, ........
test_2.csv: DATE,permno,datadate,Q's ratio 19260130,10006,19260130,1.16541374714217 19260130,10022,19260130,1.01102923080989 19260130,10030,19260130,1.06549175520466 19260130,10049,19260130,1.54355923255147 19260130,10057,19260130,3.56608118773024 19260130,10065,19260130,2.6860629359338 19260130,10073,19260130,2.0303420958083
my code here: import pandas as pd
DATA_DIR = r'C:\Users\steve\Desktop\Data\test_1.csv' df = pd.read_csv(DATA_DIR, parse_dates=['DATE', 'datadate']) q = pd.read_csv(DATA_DIR r'C:\Users\steve\Desktop\Data\test_2.csv', index_col=0, parse_dates=[1, 3])
I got this error: Traceback (most recent call last): File "", line 1, in File "C:\Program Files\JetBrains\PyCharm 2021.1\plugins\python\helpers\pydev_pydev_bundle\pydev_umd.py", line 198, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "C:\Program Files\JetBrains\PyCharm 2021.1\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents "\n", file, 'exec'), glob, loc) File "C:/Users/steve/PycharmProjects/Empirical Asset via Machine Learning/test.py", line 5, in q = pd.read_csv(DATA_DIR r'C:\Users\steve\Desktop\Data\test_2.csv', index_col=0, parse_dates=[1, 3]) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 811, in init self._engine = self._make_engine(self.engine) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine return mapping[engine](self.f, **self.options) # type: ignore[call-arg] File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 51, in init self._open_handles(src, kwds) File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\parsers\base_parser.py", line 222, in _open_handles self.handles = get_handle( File "C:\Users\steve\PycharmProjects\Empirical Asset via Machine Learning\venv\lib\site-packages\pandas\io\common.py", line 702, in get_handle handle = open( OSError: [Errno 22] Invalid argument: 'C:\Users\steve\Desktop\Data\test_1.csvC:\Users\steve\Desktop\Data\test_2.csv'
I have searched several similar topics on Stackoverflow, and try but seems no one use pd.read_csv('test_1.csv' 'test_2.csv', .....) like me
Please help, thanks.
CodePudding user response:
Invalid argument: 'C:\Users\steve\Desktop\Data\test_1.csvC:\Users\steve\Desktop\Data\test_2.csv'
You are trying to read the CSV of an invalid path. You cannot read two csv files at once.
When you call this...
pd.read_csv(DATA_DIR r'C:\Users\steve\Desktop\Data\test_2.csv', index_col=0, parse_dates=[1, 3])
You are concatenating two absolute paths together, just use...
pd.read_csv(r'C:\Users\steve\Desktop\Data\test_2.csv', index_col=0, parse_dates=[1, 3])
A perhaps better way to do this using your DATA_DIR
would be this...
DATA_DIR = r'C:\Users\steve\Desktop\Data\'
df = pd.read_csv(DATA_DIR 'test_1.csv', parse_dates=['DATE', 'datadate'])
pd.read_csv(DATA_DIR 'test_2.csv', index_col=0, parse_dates=[1, 3])
CodePudding user response:
As we can see ure trying to open file path named 'C:\Users\steve\Desktop\Data\test_1.csvC:\Users\steve\Desktop\Data\test_2.csv' and its not exist, u have to try create two different variables to open csv,and one to save results.