Home > OS >  Read CSV files with different date formats as index
Read CSV files with different date formats as index

Time:12-15

I have a series of files where the dates are in different formats, however one of the formats ends up not being recognized properly and the dataframes will be concatenated to generate only one and therefore the dates need to be in the same format. Basically what I have for reading the files is this:

for i in range(len(stations)):
   arq1 = pd.read_csv('./' database_folder '/' group '/' stations[i] ".csv", index_col = 0)
   arq1.index=pd.to_datetime(arq1.index, format='%Y-%m-%d')
   arq1.index=pd.to_datetime(arq1.index, format='%Y%m%d')

group and stations they are just lists to get to the archives.

I was thinking something like:

try:  
   arq1.index=pd.to_datetime(arq1.index, format='%Y-%m-%d')
except:
   arq1.index=pd.to_datetime(arq1.index, format='%Y%m%d')

but I don't know if it works that way. Completely open to suggestions.

Data Example. The dataframes don't have the same date ranges:

Date,TA_F_MDS
2004-04-01,1.4441
2004-04-02,1.49329
2004-04-03,1.6798
2004-04-04,1.59727
2004-04-05,1.49279
2004-04-06,1.29197
2004-04-07,1.28385
TIMESTAMP,TA_F_MDS
20120101,-9999
20120102,-9999
20120103,-9999
20120104,-9999
20120105,-9999
20120106,-9999
20120107,-9999
20120108,-9999

CodePudding user response:

The following should work:

for i in range(len(stations)):
    arq1 = # ...
    arq1.index = arq1.index.map(str)
    fmt = '%Y-%m-%d' if '-' in arq1.index[0] else '%Y%m%d'
    arq1.index = pd.to_datetime(arq1.index, format=fmt)

    print(arq1.index) # just to check

Basically, force the index to be strings instead of numbers, and then check whether the values contain a dash (-) and choose the appropriate format.

Output:

DatetimeIndex(['2004-04-01', '2004-04-02', '2004-04-03', '2004-04-04',
               '2004-04-05', '2004-04-06', '2004-04-07'],
              dtype='datetime64[ns]', name='Date', freq=None)
DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03', '2012-01-04',
               '2012-01-05', '2012-01-06', '2012-01-07', '2012-01-08'],
              dtype='datetime64[ns]', name='TIMESTAMP', freq=None)

CodePudding user response:

Because your headings are different for each datetime: eg. 'Date' and 'TIMESTAMP',
you can use converters in the following way:

Code:

def date_converter(x):
    return pd.to_datetime(x, format='%Y-%m-%d')

def timestamp_converter(x):
    return pd.to_datetime(x, format='%Y%m%d')


for i in range(len(stations)):
    arq1 = pd.read_csv('./' database_folder '/' group '/' stations[i] ".csv", index_col = 0, 
                 converters={'Date': date_converter, 'TIMESTAMP': timestamp_converter})
  • converters : provides a dictionary that explains which converter to use when it encounters a specific heading. The format to use is: heading: converter for each heading and converter that you need. For example. 'Date': date_converter, requests the use of the date_converter function for columns that have the Date header.

Note:

If needed, you could use lambda functions instead:

for i in range(len(stations)):
    arq1 = pd.read_csv(stations[i] ".csv", index_col = 0, 
                 converters={'Date': lambda x: pd.to_datetime(x, format='%Y-%m-%d'), 'TIMESTAMP': lambda x: pd.to_datetime(x, format='%Y%m%d')})
  • Related