I have the following initial dataframe:
ID | City | State |
---|---|---|
1 | LA | CA |
Scenario: I have created a fixed structure for the columns associated to the initial dataframe above. I have ingested a new dataset which comes in with an additional column.
I would like to compare the initial dataframe structure with the new dataset ingested. The new ingested dataset structure is as follows:
ID | City | State | Country |
---|---|---|---|
1 | LA | CA |
Outcome: I would like to identify the column(s) which are not part of the initial dataframe. As a result, my output should be = Country.
I am using the following code to identify the fields associated to my dataframe:
df.schema.names
I have tried to compare the above code with the structure for the initial dataframe, but no luck.
CodePudding user response:
Mobile phone typing is inconvenient, direct code.
init_cols = df.columns
new_cols = new_df.columns
result = ','.join([c for c in new_cols if c not in init_cols])