Home > Software engineering >  Recognising Missing Values compared to previous days and creating new dataframe, Pandas
Recognising Missing Values compared to previous days and creating new dataframe, Pandas

Time:10-08

df = pd.DataFrame({'Date':   {0:'2020-04-01', 1:'2020-04-01', 2:'2020-04-01', 
                              3:'2020-04-02', 4:'2020-04-02', 
                              5:'2020-04-03', 6:'2020-04-03', 7:'2020-04-03'},
                    'Name':  {0:'AA', 1:'BB', 2:'CC', 
                              3:'AA', 4:'BB',  
                              5:'AA', 6:'BB', 7:'CC'},
                    'Weight':{0:1, 1:3, 2:6, 
                              3:1, 4:4, 
                              5:2, 6:5, 7:7}})
df=df.set_index(['Date'])

For each unique date in df.index - I would like to compare the Name columns to those present for the previous date, and if a name is missing, add a value of 0 to the weight column

For example..How would I recognise that 2020-04-02 was missing a row for 'CC' and insert a weight of 0 for this row.

Thereby achieving a DataFrame that looks like this...

Name Weight
2020-04-01 AA 1
2020-04-01 BB 3
2020-04-01 CC 6
2020-04-02 AA 1
2020-04-02 BB 4
2020-04-02 CC 0
2020-04-03 AA 2
2020-04-03 BB 5
2020-04-03 CC 7

CodePudding user response:

You can add Name to index, then reshape by DataFrame.unstack with add 0 with DataFrame.stack for original format of DataFrame:

df = df.set_index('Name', append=True).unstack(fill_value=0).stack().reset_index(level=1)
print (df)
           Name  Weight
Date                   
2020-04-01   AA       1
2020-04-01   BB       3
2020-04-01   CC       6
2020-04-02   AA       1
2020-04-02   BB       4
2020-04-02   CC       0
2020-04-03   AA       2
2020-04-03   BB       5
2020-04-03   CC       7
  • Related