Home > database >  how to map data with headers in pandas when there are no fixed headers
how to map data with headers in pandas when there are no fixed headers

Time:06-10

I have pre defined headers for my data frame . but data we receive might or might not include all the columns . however our output file should have all the fields from header. If there is no data in the input file, then keep it blank else populate the values.

input file 1:
ID  Name   Address    Shippment
1   john   address1   Y
2   Jessy  address2   N

input file2 
ID  Name   Address    Shippment   Delivered
1   john   address1   Y            Y
2   Jessy  address2   N            N

headers=['ID','Name','Address','Shippment','Delivered']

output file 1:
    ID  Name   Address    Shippment   Delivered
    1   john   address1   Y            
    2   Jessy  address2   N   

 output file 1:
ID  Name   Address    Shippment   Delivered
1   john   address1   Y            Y
2   Jessy  address2   N            N   

how do I map the headers to the soruce file when the headers are random?

I tried using zip and update but this looks at index and map. The Source columns can be random and it has to match the fields order from the header.

        for i,index in  zip(header,df):
                final.update({i: df[index].tolist()})
        df_final= pd.DataFrame(final)    

How can I acheive this ?

CodePudding user response:

You can use reindex() to ensure the pre-defined headers are all included. Any missing columns in the data will be populated with NaN:

headers = ['ID','Name','Address','Shippment','Delivered']
df = df.reindex(columns=headers)

Input

   ID   Name   Address Shippment
0   1   john  address1         Y
1   2  jessy  address2         N

Output

   ID   Name   Address Shippment  Delivered
0   1   john  address1         Y        NaN
1   2  jessy  address2         N        NaN

If the columns in the input are not in the pre-defined sequence, reindex() will fix that as well:

Input

   ID   Address   Name Shippment
0   1  address1   john         Y
1   2  address2  jessy         N

Output

   ID   Name   Address Shippment  Delivered
0   1   john  address1         Y        NaN
1   2  jessy  address2         N        NaN
  • Related