If a date column doesn't have a certain date, then do something-CodePudding

I am trying to read from dataframe on Date column, if a certain date doesn't exist, then insert new data for that date.

I've tried to googled on how to know if a return is none, but I can't find any result.

So this is the way I handle it, if the return is KeyError, then I insert the date data. But my concern is, what if the KeyError is NOT because the data doesn't exist?

Or is there any better way than catching a KeyError like this?

My Code

def checkdata(ticker, date):
    try: 
        data = pd.read_sql(ticker, idxengine)
        data = data.set_index('Date')
        data.loc[date]
        print(data.loc[date])
    except KeyError as ke:
        print("Data not found, insert new date data"   ticker   str(date))

Update to simplify what I want to see as the end result

Let's say I have dataA and dataB

dataA = [['2022-02-01', 123], ['2022-02-02', 120]]
dataB = [['2022-02-01', 123], ['2022-02-03', 125]]

I want to have

dataC = [['2022-02-01', 123], ['2022-02-02', 120], ['2022-02-03', 125]]
dfC = pd.DataFrame(dataC, columns = ['date', 'price'])
print(dfC)

Expected output

         name  price
0  2022-02-01    123
1  2022-02-02    120
2  2022-02-02    125

What should I do?

CodePudding user response：

You don't want to select rows based on a condition that is applied to the index.

Make the 'date' a proper column, and write a proper mask/condition, e.g.

relevant_data = data[data['date'] > pd.Timestamp.today()]

An explicit condition will allow you to control what you want (e.g., if you have a timestamp with second precision, but you want to match based on month/day accuracy, both == and .loc on index will not work, as you need to convert the column to day/month/hour first)

UPDATE regarding your edited question: you can "combine" both dataframes into one, then remove duplicates (assuming that the price information from both dataframes is equal. if not, you will need to define the logic what price info should take precendence).

dfa = pd.DataFrame(dataA, columns=['date', 'price'])
dfb = pd.DataFrame(dataB, columns=['date', 'price'])
pd.concat([dfa, dfb]).drop_duplicates()

with output

         date  price
0  2022-02-01    123
1  2022-02-02    120
1  2022-02-03    125