I have a list which looks like this:
[2,4,3,1,5] # One to Five in a certain order
I have a df that looks like this
ID Name Address
1 A ABC
3 B DEF
I want to add which ever IDs are missing with the values being False for Name and Address (there are a lot of columns towards to right, I don't want to manually give false for it, so if there is a method that will fill False for rest of the columns then its better)
After adding it, I want the DF to be sorted based on the order of the list.
Expected result:
ID Name Address
2 False False
4 False False
3 B DEF
1 A ABC
5 False False
CodePudding user response:
Getting your initial df:
import pandas as pd
df = pd.DataFrame({'ID': [1, 3],
'Name': ['A', 'B'],
'Address': ['ABC', 'DEF']})
print(df.head()
It's useful to note here that an integer index is made for you. The way you list your df
doesn't show such an index. The required output suggests that 'ID'
is the index but it's not totally clear how you want the output in that format.
The first thing to do is set the index as 'ID'
:
df = df.set_index('ID')
Then you can reindex()
that to build a backbone of the resulting df, which will fill the remaining columns with NaN
.
df = df.set_index('ID').reindex(index=lst)
Gives:
Name Address
ID
2 NaN NaN
4 NaN NaN
3 B DEF
1 A ABC
5 NaN NaN
(Note that 'ID'
, being the index, is now on a different level to the other column names)
The final step is to .fillna()
with False
Putting it together:
df = df.set_index('ID').reindex(index=lst).fillna(False).reset_index()
The last reset_index()
is optional because, as I said, you don't appear to have an independent index in your expected output.
CodePudding user response:
One way to do this:
your_list = [2,4,3,1,5]
df1 = pd.DataFrame({'Name':[False]*len(your_list), 'Address':[False]*len(your_list)}, index=your_list)
df1.loc[df.index] = df
where df
is the sample dataframe with two rows.