I'm programmatically trying to detect the column in a dataframe that contains dates & I'm converting the date values to the same format.
My logic is to find the column name that contains the word 'Date' either as a whole word or as a sub-word (using contains()) & then work on the dates in that column.
My code:
from dateutil.parser import parse
import re
from datetime import datetime
import calendar
import pandas as pd
def date_fun(filepath):
lst_to_ser=pd.Series(filepath.columns.values)
date_col_search= lst_to_ser.str.contains(pat = 'date')
#print(date_col_search.columns.values)
for i in date_col_search:
if i is True:
formatted_dates=pd.to_datetime(date_col_search[i], errors='coerce')
print(formatted_dates)
main_path = pd.read_csv('C:/Data_Cleansing/random_dateset.csv')
fpath=main_path.copy()
date_fun(fpath)
The retrieved column names are stored in an array & since contains() works only on 'Series' I converted the array to series.
This is what 'date_col_search' variable contains:
0 False
1 True
2 False
dtype: bool
I want to access the column corresponding to the 'True' value. But I'm getting the following error at the line formatted_dates=......:
Exception has occurred: KeyError True
How should I access the 'True' column?
My dataframe:
random joiningdate branch
1 25.09.2019 rev
8 9/16/2015 pop
98 10.12.2017 switch
65 02.12.2014 high
45 08-Mar-18 aim
2 08-12-2016 docker
0 26.04.2016 grit
9 05-03-2016 trevor
56 24.12.2016 woll
4 10-Aug-19 qerty
78 abc yak
54 05-06-2015 water
42 12-2012-18 rance
43 24-02-2010 stream
38 2008,13,02 verge
78 16-09-2015 atom
CodePudding user response:
If I understand you correctly, you want to search for the row with a boolean true within an array. If so, try doing this:
for row in matrix:
for item in row:
if item == True:
return True
return False
CodePudding user response:
I would use:
def mixed_datetime(s):
# this is just an example, adapt this function to your need
return (pd.to_datetime(s, yearfirst=False, dayfirst=True, errors='coerce')
.fillna(
pd.to_datetime(s, yearfirst=True, dayfirst=False, errors='coerce')
)
)
cols = df.columns.str.contains('date', case=False)
df.loc[:, cols] = df.loc[:, cols].apply(mixed_datetime)
Updated DataFrame:
random joiningdate branch
0 1 2019-09-25 rev
1 8 2015-09-16 pop
2 98 2017-12-10 switch
3 65 2014-12-02 high
4 45 2018-03-08 aim
5 2 2016-12-08 docker
6 0 2016-04-26 grit
7 9 2016-03-05 trevor
8 56 2016-12-24 woll
9 4 2019-08-10 qerty
10 78 NaT yak
11 54 2015-06-05 water
12 42 NaT rance
13 43 2010-02-24 stream
14 38 2008-02-01 verge
15 78 2015-09-16 atom