There are two dataframes
df1
---> OrgID, location, Address, State
-- which is delta table
df2
---> OrgID, location, Address, State, Active, Seq_Number
-- which is createOrReplaceTempViewtable
In df2
there are two additional columns Active
, Seq_Number
.
- How to get the datatypes of the new columns
- How to add the new columns to the existing Delta table and update the values
tried the below:
converted the dataframes to pandaDF and used this, which got new columns in the Index
object.
df_new_columns = df1.columns.difference(df2.columns)
new = [ ]
if len(df_new_columns.tolist()) != 0:
for column in df_new_columns.tolist():
column_name = column
new.append(column)
CodePudding user response:
Try this:
delta_table = pd.concat([delta_table, createOrReplaceTempViewtable[['Active', 'Seq_Number']]])
CodePudding user response:
If you do not want to name any of the duplicate tables when join the missing one from your original table, you can do the following thing:
import pandas as pd
a = [{'A': 3, 'B': 5, 'C': 3, 'D': 2},{'A': 2, 'B': 4, 'C': 3, 'D': 9}]
df1 = pd.DataFrame(a)
b = [{'F': 0, 'M': 4, 'B': 2, 'C': 8 },{'F': 2, 'M': 4, 'B': 3, 'C': 9}]
df2 = pd.DataFrame(b)
df_delta = df2.columns.difference(df1.columns)
print(pd.concat([df1,df2]).T.drop_duplicates().T)
which gives:
A B C D F M
0 3.0 5.0 3.0 2.0 NaN NaN
1 2.0 4.0 3.0 9.0 NaN NaN
0 NaN 2.0 8.0 NaN 0.0 4.0
1 NaN 3.0 9.0 NaN 2.0 4.0
and
Index(['F', 'M'], dtype='object')