Home > Enterprise >  How to add rows only if it doesn't exists in the df
How to add rows only if it doesn't exists in the df

Time:08-21

I have a list which looks like this:

[2,4,3,1,5]  # One to Five in a certain order

I have a df that looks like this

ID    Name    Address
1      A         ABC
3      B         DEF

I want to add which ever IDs are missing with the values being False for Name and Address (there are a lot of columns towards to right, I don't want to manually give false for it, so if there is a method that will fill False for rest of the columns then its better)

After adding it, I want the DF to be sorted based on the order of the list.

Expected result:

ID   Name    Address
2    False      False
4    False      False
3    B          DEF
1    A          ABC
5    False      False

CodePudding user response:

Getting your initial df:

import pandas as pd

df = pd.DataFrame({'ID': [1, 3],
                   'Name': ['A', 'B'],
                   'Address': ['ABC', 'DEF']})
print(df.head()

It's useful to note here that an integer index is made for you. The way you list your df doesn't show such an index. The required output suggests that 'ID' is the index but it's not totally clear how you want the output in that format.

The first thing to do is set the index as 'ID':

df = df.set_index('ID')

Then you can reindex() that to build a backbone of the resulting df, which will fill the remaining columns with NaN.

df = df.set_index('ID').reindex(index=lst)

Gives:

  Name Address
ID             
2   NaN     NaN
4   NaN     NaN
3     B     DEF
1     A     ABC
5   NaN     NaN

(Note that 'ID', being the index, is now on a different level to the other column names)

The final step is to .fillna() with False

Putting it together:

df = df.set_index('ID').reindex(index=lst).fillna(False).reset_index()

The last reset_index() is optional because, as I said, you don't appear to have an independent index in your expected output.

CodePudding user response:

One way to do this:

your_list = [2,4,3,1,5]
df1 = pd.DataFrame({'Name':[False]*len(your_list), 'Address':[False]*len(your_list)}, index=your_list)

df1.loc[df.index] = df

where df is the sample dataframe with two rows.

  • Related