Home > Software engineering >  Pandas If record for specific date does not exist insert new record with remaining columns NA
Pandas If record for specific date does not exist insert new record with remaining columns NA

Time:07-21

I have a dataframe and I want to insert a record with the input date and leave the remaining columns NA if input date does not exist. If input date does exist do nothing

InputDate="5/1/2022"

if inputdate does not exist as a record insert new record

In

Invoice Date  ...       Check
1    2022-04-01  ...        1.30
2    2022-03-01  ...        1.19

Out
Invoice Date  ...       Check
0    2022-05-01  ...        NaN
1    2022-04-01  ...        1.30
2    2022-03-01  ...        1.19

CodePudding user response:

You can use:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19]})
df['Invoice Date'] = pd.to_datetime(df['Invoice Date']) 

input_date="5/1/2022"

if pd.to_datetime(input_date) not in df['Invoice Date']:
    row = {'Invoice Date': pd.to_datetime(input_date)}
    row.update({col: np.nan for col in df.columns if col not in row})
    df.loc[len(df)] = row

Output

  Invoice Date  Check
0   2022-04-01   1.30
1   2022-03-01   1.19
2   2022-05-01    NaN

If you want them sorted in reverse chronological order like shown above you can do:

df.sort_values(by='Invoice Date', ascending=False).reset_index(drop=True)

Output

  Invoice Date  Check
0   2022-05-01    NaN
1   2022-04-01   1.30
2   2022-03-01   1.19

CodePudding user response:

You could also set the date column as index and then use append method with verify_integrity=True.

If the index exists, it will throw an error, if not then add NaN values to the remaining columns.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19], 'Check2': [1.30, 1.19]})
df = df.set_index('Invoice Date')

new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2023-04-01'])
df.append(new_row, verify_integrity=True)

Output:

            Check  Check2
2022-04-01  1.30    1.30
2022-03-01  1.19    1.19
2023-04-01  NaN     NaN

The following code will raise an error ValueError: Indexes have overlapping values: Index(['2022-03-01'], dtype='object'). So you can catch it!

new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2022-03-01'])
df.append(new_row, verify_integrity=True)
  • Related