Pandas If record for specific date does not exist insert new record with remaining columns NA-CodePudding

I have a dataframe and I want to insert a record with the input date and leave the remaining columns NA if input date does not exist. If input date does exist do nothing

InputDate="5/1/2022"

if inputdate does not exist as a record insert new record

In

Invoice Date  ...       Check
1    2022-04-01  ...        1.30
2    2022-03-01  ...        1.19

Out
Invoice Date  ...       Check
0    2022-05-01  ...        NaN
1    2022-04-01  ...        1.30
2    2022-03-01  ...        1.19

CodePudding user response：

You can use:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19]})
df['Invoice Date'] = pd.to_datetime(df['Invoice Date']) 

input_date="5/1/2022"

if pd.to_datetime(input_date) not in df['Invoice Date']:
    row = {'Invoice Date': pd.to_datetime(input_date)}
    row.update({col: np.nan for col in df.columns if col not in row})
    df.loc[len(df)] = row

Output

  Invoice Date  Check
0   2022-04-01   1.30
1   2022-03-01   1.19
2   2022-05-01    NaN

If you want them sorted in reverse chronological order like shown above you can do:

df.sort_values(by='Invoice Date', ascending=False).reset_index(drop=True)

Output

  Invoice Date  Check
0   2022-05-01    NaN
1   2022-04-01   1.30
2   2022-03-01   1.19

CodePudding user response：

You could also set the date column as index and then use append method with verify_integrity=True.

If the index exists, it will throw an error, if not then add NaN values to the remaining columns.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19], 'Check2': [1.30, 1.19]})
df = df.set_index('Invoice Date')

new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2023-04-01'])
df.append(new_row, verify_integrity=True)

Output:

            Check  Check2
2022-04-01  1.30    1.30
2022-03-01  1.19    1.19
2023-04-01  NaN     NaN

The following code will raise an error ValueError: Indexes have overlapping values: Index(['2022-03-01'], dtype='object'). So you can catch it!

new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2022-03-01'])
df.append(new_row, verify_integrity=True)