Accesing fields of all the rows of a dataframe SPARK-CodePudding

Alright so that seems easy but i couldnt find any solution or responses to it. I simply have a dataframe with a column full of nulls, and i just want to fill it with "s" or "n" randomly.

I tried this `

df.foreach(f=>{
  
  if(random)
    f.get(4) = "s"
  else{f.get(4) = "n"}
})

But doesnt work, cause i think f is just a list, not the actual value The pseudo would be something like that:

for(i=0;i<max_rows;i  )
  if(prob<.5)
   {df[i]["column_field"] == "s"}
  else
   {df[i]["column_field"] == "n"}

CodePudding user response：

Replace all integer and long columns

df.na.fill(0)
  .show(false)

Replace with specific columns

df.na.fill(0,Array("population"))
  .show(false)

String type all columns

df.na.fill("")
  .show(false)

Specific columns

df.na.fill("unknown",Array("city"))
  .na.fill("",Array("type"))
  .show(false)

CodePudding user response：

DataFrame na fill

Class DataFrame

for your question new value to every row

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import spark.implicits._
val df1 = Seq((0.5f, "v1"), (0.2f, "v2"), (1f, "v3"), (4f, "v4"))
  .toDF("prob", "column_field")
df1.show(false)
/*
 ---- ------------ 
|prob|column_field|
 ---- ------------ 
|0.5 |v1          |
|0.2 |v2          |
|1.0 |v3          |
|4.0 |v4          |
 ---- ------------ 
*/
val resDF = df1.withColumn(
  "column_field",
  when(col("prob") <= 0.5f, "s")
    .otherwise("n")
)

resDF.show(false)
/*
 ---- ------------ 
|prob|column_field|
 ---- ------------ 
|0.5 |s           |
|0.2 |s           |
|1.0 |n           |
|4.0 |n           |
 ---- ------------ 
*/