I searched here but I didn't find anything that worked for me. Basically, I have only 1 row list (with some columns) and I have to write them in a parquet table. I need to "cast" that from list to DF, but for only 1 row, I have many problems!
from pyspark.sql import Window,Row
from pyspark.sql import functions as F
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
tablename='table'
start_time = F.lit(datetime.datetime.now())
count_1 = 0
count_2 = 0
count_3 = 0
list = [F.lit(start_time),
F.lit(tablename),
F.lit(count_1),
F.lit(count_2),
F.lit(count_3),
F.current_timestamp()]
columns = ['start_time', 'table', 'count_1', 'count_2', 'count_3', 'end_time']
When I try to use parallelize or .toDF it returns some error.
Does anyone know how can I do?
CodePudding user response:
If you modify your dataset to remove spark F
function & types, it works:
from pyspark.sql import Window,Row
from pyspark.sql import functions as F
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
import datetime
tablename='table'
start_time = datetime.datetime.now()
count_1 = 0
count_2 = 0
count_3 = 0
list = [(start_time,
tablename,
count_1,
count_2,
count_3,
datetime.datetime.now())]
columns = ['start_time', 'table', 'count_1', 'count_2', 'count_3', 'end_time']
df = spark.createDataFrame(list, columns)
df.show()
-------------------- ----- ------- ------- ------- --------------------
| start_time|table|count_1|count_2|count_3| end_time|
-------------------- ----- ------- ------- ------- --------------------
|2022-10-21 16:39:...|table| 0| 0| 0|2022-10-21 16:39:...|
-------------------- ----- ------- ------- ------- --------------------
PS - You are clouding list type with variable name list
. It did not cause the issue; but may lead to some other issue.