I see some PySpark dataframe has list of values like [2,,3,,,4]. These values between commas are null but they're not 'null' in the list. Could someone suggest how this kind of list is generated?
Thanks, J
CodePudding user response:
They are empty strings
.
import pyspark.sql.functions as F
......
data = [
('2,,3,,,4',)
]
df = spark.createDataFrame(data, ['col'])
df = df.withColumn('col', F.split('col', ','))
df.printSchema()
df.show(truncate=False)