Home > Blockchain >  Converting double datatype column to binary and returning sum of digits in new column PySpark Datafr
Converting double datatype column to binary and returning sum of digits in new column PySpark Datafr

Time:07-12

I am new to PySpark and I am trying to convert couple of columns with double datatype to binary and I want to count number of non-zero values in the binary number / get sum of binary number digits

My sample data looks as follows

bit_1   bit_2   bit_3   bit_4   bit_5   bit_6
0       2       8       0       0       0
11      0       16      64      0       0
10      0       0       0       256     144
12      15      15      0       0       0
20      0       17      0       0       0
250     12      0       0       0       0
300     72      84      64      0       0
320     100     120     140     220     240


so far I tried below

test_df = df.withColumn('bit_sum', sum(map(int,"{0:b}".format(F.col('bit_1')))))

above code throws me error

I even tried below

df_2 = (df
        
         .withColumn('bit_1_bi', F.lpad(F.bin(F.col('bit_1')),12,'0'))
         .withColumn('bit_2_bi', F.lpad(F.bin(F.col('bit_2')),12,'0'))
         .withColumn('bit_3_bi', F.lpad(F.bin(F.col('bit_3')),12,'0'))
         .withColumn('bit_4_bi', F.lpad(F.bin(F.col('bit_4')),12,'0'))
         .withColumn('bit_5_bi', F.lpad(F.bin(F.col('bit_5')),12,'0'))
         .withColumn('bit_6_bi', F.lpad(F.bin(F.col('bit_6')),12,'0'))
    
        )

CodePudding user response:

Let us use bin to convert the column values into binary string representation, then replace 0's with empty string and count the length of resulting string to calculate number of 1's

df.select(*[F.length(F.regexp_replace(F.bin(c), '0', '')).alias(c) for c in df.columns])

 ----- ----- ----- ----- ----- ----- 
|bit_1|bit_2|bit_3|bit_4|bit_5|bit_6|
 ----- ----- ----- ----- ----- ----- 
|    0|    1|    1|    0|    0|    0|
|    3|    0|    1|    1|    0|    0|
|    2|    0|    0|    0|    1|    2|
|    2|    4|    4|    0|    0|    0|
|    2|    0|    2|    0|    0|    0|
|    6|    2|    0|    0|    0|    0|
|    4|    2|    3|    1|    0|    0|
|    2|    3|    4|    3|    5|    4|
 ----- ----- ----- ----- ----- ----- 
  • Related