How to convert print output to pyspark dataframe (no pandas allowed)-CodePudding

Home > Software engineering > How to convert print output to pyspark dataframe (no pandas allowed)

How to convert print output to pyspark dataframe (no pandas allowed)

Time：02-23

The usual code

print((sparkdf.count(), len(sparkdf.columns)))

Since I using HDFS system that fully on HDFS, no pandas allowed, The output I need

|-------|-------|
|row    |columns|
|-------|-------|
|1500   |    22 |
|-------|-------|

CodePudding user response：

Just use spark.createDataFrame and pass the values as a list of tuple:

spark.createDataFrame([(sparkdf.count(), len(sparkdf.columns))], schema=['rows', 'columns'])