numerical_cols = ["temperature","timestamp"]
ID temperature system_state timestamp
0 B 12 inactive 1632733508
1 B 13 active 1632733508
2 A 4 NULL 1632733511
3 A 11 NULL 1632733512
4 D 20 450 1632733513
5 D 22 431 1632733515
6 C 25 20 1632733518
7 C 19 30 1632733521
I have a dataframe with several columns and a list containing partwise the names of the df columns. Now I want to check if the column exists in the list. If the column is in the list, it should be casted into a double type. How can I do this?
CodePudding user response:
Here's an example how to do that:
spark = SparkSession.builder.getOrCreate()
data = [{"a": "12.1", "b": "23.2", "c": "33.2"}]
columns = ["a", "c"]
df = spark.createDataFrame(data)
df = df.select(
[F.col(c).cast(DoubleType()) if c in columns else F.col(c) for c in df.columns]
)
Result:
root
|-- a: double (nullable = true)
|-- b: string (nullable = true)
|-- c: double (nullable = true)
---- ---- ----
|a |b |c |
---- ---- ----
|12.1|23.2|33.2|
---- ---- ----