Alright so that seems easy but i couldnt find any solution or responses to it. I simply have a dataframe with a column full of nulls, and i just want to fill it with "s" or "n" randomly.
I tried this `
df.foreach(f=>{
if(random)
f.get(4) = "s"
else{f.get(4) = "n"}
})
`
But doesnt work, cause i think f is just a list, not the actual value The pseudo would be something like that:
for(i=0;i<max_rows;i )
if(prob<.5)
{df[i]["column_field"] == "s"}
else
{df[i]["column_field"] == "n"}
CodePudding user response:
Replace all integer and long columns
df.na.fill(0)
.show(false)
Replace with specific columns
df.na.fill(0,Array("population"))
.show(false)
String type all columns
df.na.fill("")
.show(false)
Specific columns
df.na.fill("unknown",Array("city"))
.na.fill("",Array("type"))
.show(false)
CodePudding user response:
for your question new value to every row
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import spark.implicits._
val df1 = Seq((0.5f, "v1"), (0.2f, "v2"), (1f, "v3"), (4f, "v4"))
.toDF("prob", "column_field")
df1.show(false)
/*
---- ------------
|prob|column_field|
---- ------------
|0.5 |v1 |
|0.2 |v2 |
|1.0 |v3 |
|4.0 |v4 |
---- ------------
*/
val resDF = df1.withColumn(
"column_field",
when(col("prob") <= 0.5f, "s")
.otherwise("n")
)
resDF.show(false)
/*
---- ------------
|prob|column_field|
---- ------------
|0.5 |s |
|0.2 |s |
|1.0 |n |
|4.0 |n |
---- ------------
*/