Home > OS >  Compare headers of dataframes and add the columns to the delta table
Compare headers of dataframes and add the columns to the delta table

Time:02-11

There are two dataframes

df1 ---> OrgID, location, Address, State -- which is delta table

df2 ---> OrgID, location, Address, State, Active, Seq_Number -- which is createOrReplaceTempViewtable

In df2 there are two additional columns Active, Seq_Number.

  1. How to get the datatypes of the new columns
  2. How to add the new columns to the existing Delta table and update the values

tried the below:

converted the dataframes to pandaDF and used this, which got new columns in the Index object.

df_new_columns = df1.columns.difference(df2.columns)
new = [ ]

if len(df_new_columns.tolist()) != 0:

for column in df_new_columns.tolist():
    column_name = column
    new.append(column)

CodePudding user response:

Try this:

delta_table = pd.concat([delta_table, createOrReplaceTempViewtable[['Active', 'Seq_Number']]])

CodePudding user response:

If you do not want to name any of the duplicate tables when join the missing one from your original table, you can do the following thing:

import pandas as pd

a = [{'A': 3, 'B': 5, 'C': 3, 'D': 2},{'A': 2,  'B': 4, 'C': 3, 'D': 9}]
df1 = pd.DataFrame(a)
b = [{'F': 0,  'M': 4,  'B': 2,  'C': 8 },{'F': 2,  'M': 4, 'B': 3, 'C': 9}]
df2 = pd.DataFrame(b)

df_delta = df2.columns.difference(df1.columns)

print(pd.concat([df1,df2]).T.drop_duplicates().T)

which gives:

 A    B    C    D    F    M
0  3.0  5.0  3.0  2.0  NaN  NaN
1  2.0  4.0  3.0  9.0  NaN  NaN
0  NaN  2.0  8.0  NaN  0.0  4.0
1  NaN  3.0  9.0  NaN  2.0  4.0

and

Index(['F', 'M'], dtype='object')
  • Related