Home > OS >  How does one check if a Spark row or row.schema 'contains' a field name?
How does one check if a Spark row or row.schema 'contains' a field name?

Time:11-03

For example, I have a spark row:

Row row = ...

I can evaluate the following command in an interactive session with the debugger:

row.schema.fieldNamesSet.contains("title")
> true

However, I cannot write:

assertThat(row.schema.fieldNamesSet.contains("title"))
// or
assertThat(row.schema().fieldNamesSet.contains("title"))
// etc.

// this method path is not available because it has "private access"

(General question, or Y) How do I assert that a fieldName is not present in the row?

(Specific question, or X) How do I perform an in-line check whether a schema contains a fieldName?

CodePudding user response:

The schema of a Row is an instance of the StructType class, so you can refer to the JavaDoc of this class to find out all the public fields and methods that you can use. Note that you can use all the methods defined in the StructType class plus all methods inherited from the superclasses and interfaces.

In particular, to verify if the schema contains or not a given field name you have various options:

exists method

Pass a predicate to the exists method that will be evaluated for each field and returns true if at least one field matches the condition. It is also useful if you want to evaluate other conditions besides the name.

row.schema().exists(f -> "title".equals(f.name()));
getFieldIndex method

The StructType.getFieldIndex method returns an Option pointing to the actual field index if present, or to None if not present.

row.schema().getFieldIndex("title").isDefined();

You can also access the fields or fieldNames arrays with the fields() and fieldNames() methods and process them as it is most convenient for your use case.

  • Related