Home > Enterprise >  Accesing fields of all the rows of a dataframe SPARK
Accesing fields of all the rows of a dataframe SPARK

Time:11-23

Alright so that seems easy but i couldnt find any solution or responses to it. I simply have a dataframe with a column full of nulls, and i just want to fill it with "s" or "n" randomly.

I tried this `

df.foreach(f=>{
  
  if(random)
    f.get(4) = "s"
  else{f.get(4) = "n"}
})

`

But doesnt work, cause i think f is just a list, not the actual value The pseudo would be something like that:

for(i=0;i<max_rows;i  )
  if(prob<.5)
   {df[i]["column_field"] == "s"}
  else
   {df[i]["column_field"] == "n"}

CodePudding user response:

Replace all integer and long columns

df.na.fill(0)
  .show(false)

Replace with specific columns

df.na.fill(0,Array("population"))
  .show(false)

String type all columns

df.na.fill("")
  .show(false)

Specific columns

df.na.fill("unknown",Array("city"))
  .na.fill("",Array("type"))
  .show(false)

CodePudding user response:

DataFrame na fill

Class DataFrame

for your question new value to every row

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import spark.implicits._
val df1 = Seq((0.5f, "v1"), (0.2f, "v2"), (1f, "v3"), (4f, "v4"))
  .toDF("prob", "column_field")
df1.show(false)
/*
 ---- ------------ 
|prob|column_field|
 ---- ------------ 
|0.5 |v1          |
|0.2 |v2          |
|1.0 |v3          |
|4.0 |v4          |
 ---- ------------ 
*/
val resDF = df1.withColumn(
  "column_field",
  when(col("prob") <= 0.5f, "s")
    .otherwise("n")
)

resDF.show(false)
/*
 ---- ------------ 
|prob|column_field|
 ---- ------------ 
|0.5 |s           |
|0.2 |s           |
|1.0 |n           |
|4.0 |n           |
 ---- ------------ 
*/
  • Related