Home > Enterprise >  Create column of array of differences between two adjacent numbers in another column's array py
Create column of array of differences between two adjacent numbers in another column's array py

Time:10-20

I have a column of arrays made of numbers, ie [0,80,160,220], and would like to create a column of arrays of the differences between adjacent terms, ie [80,80,60]

Does anyone have an idea how to do this in Python/PySpark? my code is df=df.withcolumn('col_array_diffs', [df.col_array.getItem[i]-df.col_array.getItem[i-1] if i else None for i in range(1,F.size(df.col_array))]) but am really struggling with the arraytype. This produces AssertionError: col should be Column...Thanks!

CodePudding user response:

You can use a UDF to do this.

import pyspark.sql.types as T

def subtract_el(x):
    return [abs(i-j) for i, j in list(zip(x, x[1:]))]

df = spark.createDataFrame(pd.DataFrame([[[0,80,160,220]]]))
df.select(F.udf(subtract_el, T.ArrayType(T.IntegerType()))("0").alias("diff")).show()

Results in :

 ------------ 
|        diff|
 ------------ 
|[80, 80, 60]|
 ------------ 
  • Related