Home > Back-end >  Identify field not in dataframe using Spark
Identify field not in dataframe using Spark

Time:04-12

I have the following initial dataframe:

ID City State
1 LA CA

Scenario: I have created a fixed structure for the columns associated to the initial dataframe above. I have ingested a new dataset which comes in with an additional column.

I would like to compare the initial dataframe structure with the new dataset ingested. The new ingested dataset structure is as follows:

ID City State Country
1 LA CA

Outcome: I would like to identify the column(s) which are not part of the initial dataframe. As a result, my output should be = Country.

I am using the following code to identify the fields associated to my dataframe:

df.schema.names

I have tried to compare the above code with the structure for the initial dataframe, but no luck.

CodePudding user response:

Mobile phone typing is inconvenient, direct code.

init_cols = df.columns
new_cols = new_df.columns
result = ','.join([c for c in new_cols if c not in init_cols])
  • Related