Home > front end >  Pandas: Fill first spare rows with new updated data
Pandas: Fill first spare rows with new updated data

Time:03-25

Let say I have a base_df I wish to update with new data from new_df.

base_df could be a dataframe as the following. Where at some point FC and FF faults were removed, and that indexes were marked as Spare

base_df = pd.DataFrame({'Fault': ['FA','FB','Spare1','FD','FE','Spare2'], 
                   'Description' : ['Des for FA','Des for FB','Des for FC','Des for FD',\
                                    'Des for FE','Des for FF']})
base_df
    Fault   Description
0   FA      Des for FA
1   FB      Des for FB
2   Spare   Des for FC
3   FD      Des for FD
4   FE      Des for FE
5   Spare   Des for FF

In the new_df a fault could be missing (e.g. FB) and it could contain new rows (e.g.FG,FH,FI and FJ). However new_df will never contain nor NaN or Spare rows.

new_df = pd.DataFrame({'Fault': ['FA','FD','FE','FG','FH','FI'], 
                   'Description' : ['Des for FA','Des for FD','Des for FE',\
                                    'Des for FG','Des for FH','Des for FI']})
new_df
    Fault   Description
0   FA      Des for FA
1   FD      Des for FD
2   FE      Des for FE
3   FG      Des for FG
4   FH      Des for FH
5   FI      Des for FI
6   FJ      Des for FJ

To update base_df with new_df the following rules should be followed:

  1. If in new_df is missing a fault that appears in base_df, it is marked as Spare in updated_df.
  2. If in new_df are new faults, first it should fill Spare rows. If there are not spare rows, they can be added to the end.

With the previous example, the updated_df should finish as per below.

updated_df = pd.DataFrame({'Fault': ['FA','FG','FH','FD','FE','FI','FJ'], 
                   'Description' : ['Des for FA','Des for FG','Des for FH','Des for FD',\
                                    'Des for FE','Des for FI','Des for FJ']})
updated_df
    Fault   Description
0   FA      Des for FA
1   FG      Des for FG
2   FH      Des for FH
3   FD      Des for FD
4   FE      Des for FE
5   FI      Des for FI
6   FJ      Des for FJ

ADDITIONAL INFORMATION

  • Description for Spare rows can be removed, set as Nan or the easyest value to work with

  • In a second step I would like also to mark the new added rows in green and the deleted rows saved in a df apart

CodePudding user response:

The solution is:

fault1=list(base_df.Fault)
fault2=list(new_df.Fault)
i=0
updated_df=base_df
for k in fault1:
    i=i 1
    if k not in fault2:
        updated_df.loc[i-1,'Fault']="Spare"
        updated_df.loc[i-1,'Description']="Spare"



for k in fault2:
    if k not in list(updated_df.Fault):
        try:
            i=list(updated_df.Fault).index('Spare')
            updated_df.loc[i,'Fault']=k
            updated_df.loc[i,'Description']=new_df[new_df.Fault==k].Description.values[0]
        except:
            updated_df.loc[len(updated_df.index)]=[k,new_df[new_df.Fault==k].Description.values[0]]
            
            
updated_df

enter image description here

  • Related