Home > database >  pyspark dataframe: fillna values of selected columns with different data types
pyspark dataframe: fillna values of selected columns with different data types

Time:06-06

Following are legal:

df.fillna(0, subset=['a', 'b'])
or
df.fillna( { 'a':0, 'b':0 } )

Question: Is df.fillna( { 'a':0, 'b':'2022-12-01' } ) where column a as is of float type, and b is of date type, allowed, as well?

CodePudding user response:

Yes, according to documentation different data types are allowed but

the replacement value must be an int, float, boolean, or string.

CodePudding user response:

Works for me

input schema

root
 |-- A: double (nullable = true)
 |-- D: date (nullable = true)

input df

 ---- ---------- 
|   A|         D|
 ---- ---------- 
| 0.0|      null|
|null|2009-01-02|
| 2.0|2009-01-05|
| 3.0|2009-01-06|
| 4.0|2009-01-07|
| 0.0|      null|
|null|2009-01-02|
| 2.0|2009-01-05|
| 3.0|2009-01-06|
| 4.0|2009-01-07|
 ---- ---------- 

fillna

df=df.fillna({'A': 50, 'D': '2022-12-01'})
df.printSchema()

output schema

root
 |-- A: double (nullable = false)
 |-- D: date (nullable = true)

output df

 ---- ---------- 
|   A|         D|
 ---- ---------- 
| 0.0|2022-12-01|
|50.0|2009-01-02|
| 2.0|2009-01-05|
| 3.0|2009-01-06|
| 4.0|2009-01-07|
| 0.0|2022-12-01|
|50.0|2009-01-02|
| 2.0|2009-01-05|
| 3.0|2009-01-06|
| 4.0|2009-01-07|
 ---- ---------- 
  • Related