Want to convert integer datatype column to list datatype
Given DataFrame
a b
0 9 2
1 9 3
Want to convert to
a b
0 9 [2]
1 9 [3]
Pandas solution
import pandas as pd
df = pd.DataFrame({"a":[1,2],"b":[3,4]})
df["b"] = df["b"].apply(lambda row: [row])
How can I achieve the same in pyspark ?
I tried a naive way
from pyspark.sql.types import IntegerType, ArrayType
from pyspark.sql.functions import col
df_sp = spark.createDataFrame(df)
#EDIT according to 过过招 Answer
df_sp = df_sp.withColumn("b",col("b").cast(ArrayType(IntegerType())))
display(df_sp)
Which gives an error AnalysisException: cannot resolve '
b' due to data type mismatch: cannot cast bigint to array<int>;
CodePudding user response:
You need to specify the data type of element
in the array
.
df_sp = df_sp.withColumn("b", col("b").cast(ArrayType(IntegerType())))
df_sp.show()