Home > front end >  verify which part of dataframe string cannot convert to string
verify which part of dataframe string cannot convert to string

Time:04-19

I have a dataframe like this

from datetime import datetime

df1=pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})
try:
    pd.to_datetime(df1)
except ValueError:
  print("second is wrong")

we see that second value is not valid data format for datetime %Y-%m-%d so what should I wrote to create desired output like this with pandas method

true
false 
true 
true

rather than using

try:
    res = bool(datetime.strptime(test_str, format))
except ValueError:
    res = False

to testing the whole dataframe with a for loop?

EDIT: this is just a testing dataframe, the real dataframe is way larger than this one, the reason I want to avoid for loop trying to make it run faster (as we know apply lambda run faster than for loop). It would be better if using pandas local method

CodePudding user response:

try and except is actually good practice. You can wrap it into a function which you can apply:

import pandas as pd
from datetime import datetime


def datetime_if_possible(dt):
    try:
        return datetime.fromisoformat(dt)
    except ValueError:
        return None  # or any other default value


df1 = pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})

df1['parsed'] = df1['date'].apply(datetime_if_possible)

         date     parsed
0  2016-02-02 2016-02-02
1        2013        NaT
2  2015-02-03 2015-02-03
3  2014-03-02 2014-03-02

CodePudding user response:

pandas "apply()" function is one of the simplistic way to avoid for loop, with specifying predefined format. This answer is an extension to the one provided by @Raphael, with the flexibility to format and print the required boolean values.

from datetime import datetime
import pandas as pd

def check_date(date):
    try:
        datetime.datetime.strptime(date, "%Y-%m-%d")
        print(True)
    except ValueError:
        print(False)

df1=pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})
df1["date"].apply(check_date)

CodePudding user response:

Use str.match

m = df1['date'].str.match('\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])')
print(m)

0     True
1    False
2     True
3     True
Name: date, dtype: bool
  • Related