Assuming I have a Dataset<Person> personList
, that contains list of Person
.
Person
is defined as follows:
public class Person {
String name;
String gender;
}
Now I have the list personList
as dataset, but I need to backfill another attribute into Person
, let's say it's age
. So I can update my Person
to
public class Person {
String name;
String gender;
int age;
}
How do I loop through the Dataset and upate the age value?
I tried this approach, but it didn't update anything:
personList.foreach(person -> {
person.setAge(12);
});
I tried to give every Person
in the personList
age of 12, but when I read the data set, the age
value is still empty.
Why?
CodePudding user response:
You can add a column using .withColumn(colName, lit(colValue))
personList = personList.withColumn("age", functions.lit("12"));
CodePudding user response:
Either import object like you do know and use it to access method:
import org.apache.spark.sql.functions;
df.withColumn("foo", functions.lit(1));
or use import static and call method directly:
import static org.apache.spark.sql.functions.lit;
df.withColumn("foo", lit(1));