Home > Software engineering >  How to convert print output to pyspark dataframe (no pandas allowed)
How to convert print output to pyspark dataframe (no pandas allowed)

Time:02-23

The usual code

print((sparkdf.count(), len(sparkdf.columns)))

Since I using HDFS system that fully on HDFS, no pandas allowed, The output I need

|-------|-------|
|row    |columns|
|-------|-------|
|1500   |    22 |
|-------|-------|

CodePudding user response:

Just use spark.createDataFrame and pass the values as a list of tuple:

spark.createDataFrame([(sparkdf.count(), len(sparkdf.columns))], schema=['rows', 'columns'])
  • Related