Home > database >  Finding the 'Date' column in a dataframe
Finding the 'Date' column in a dataframe

Time:01-08

I'm programmatically trying to detect the column in a dataframe that contains dates & I'm converting the date values to the same format.

My logic is to find the column name that contains the word 'Date' either as a whole word or as a sub-word (using contains()) & then work on the dates in that column.

My code:

from dateutil.parser import parse
import re
from datetime import datetime
import calendar

import pandas as pd

def date_fun(filepath):

            lst_to_ser=pd.Series(filepath.columns.values)

            date_col_search= lst_to_ser.str.contains(pat = 'date')

            #print(date_col_search.columns.values)

            for i in date_col_search:

               if i is True:

                   formatted_dates=pd.to_datetime(date_col_search[i], errors='coerce')
                   print(formatted_dates)

main_path = pd.read_csv('C:/Data_Cleansing/random_dateset.csv')
fpath=main_path.copy()
    
    date_fun(fpath)

 

The retrieved column names are stored in an array & since contains() works only on 'Series' I converted the array to series.

This is what 'date_col_search' variable contains:

0    False
1     True
2    False
dtype: bool

I want to access the column corresponding to the 'True' value. But I'm getting the following error at the line formatted_dates=......:

Exception has occurred: KeyError True

How should I access the 'True' column?

My dataframe:

random  joiningdate branch
1       25.09.2019  rev
8       9/16/2015   pop
98      10.12.2017  switch
65      02.12.2014  high
45      08-Mar-18   aim
2       08-12-2016  docker
0       26.04.2016  grit
9       05-03-2016  trevor
56      24.12.2016  woll
4       10-Aug-19   qerty
78      abc         yak
54      05-06-2015  water
42      12-2012-18  rance
43      24-02-2010  stream
38      2008,13,02  verge
78      16-09-2015  atom

CodePudding user response:

If I understand you correctly, you want to search for the row with a boolean true within an array. If so, try doing this:

 for row in matrix:
    for item in row:
      if item == True:
        return True
  return False

CodePudding user response:

I would use:

def mixed_datetime(s):
    # this is just an example, adapt this function to your need
    return (pd.to_datetime(s, yearfirst=False, dayfirst=True, errors='coerce')
              .fillna(
              pd.to_datetime(s, yearfirst=True, dayfirst=False, errors='coerce')
              )
           )

cols = df.columns.str.contains('date', case=False)
df.loc[:, cols] = df.loc[:, cols].apply(mixed_datetime)

Updated DataFrame:

    random joiningdate  branch
0        1  2019-09-25     rev
1        8  2015-09-16     pop
2       98  2017-12-10  switch
3       65  2014-12-02    high
4       45  2018-03-08     aim
5        2  2016-12-08  docker
6        0  2016-04-26    grit
7        9  2016-03-05  trevor
8       56  2016-12-24    woll
9        4  2019-08-10   qerty
10      78         NaT     yak
11      54  2015-06-05   water
12      42         NaT   rance
13      43  2010-02-24  stream
14      38  2008-02-01   verge
15      78  2015-09-16    atom
  • Related