Home > Back-end >  How to extract bit value from binary operation?
How to extract bit value from binary operation?

Time:06-08

Consider the following code:

>> from pyspark.sql import Row, functions
>> mask = 0b10
>> test = 0b1100010
>> df = spark.createDataFrame([Row(a=mask, b=test)])
>> df.withColumn("c", df.a.bitwiseAND(df.b)).select(functions.col("c")).collect()
   [Row(c=2)]

I would like to adapt this code to perform a binary operation to extract the value of the 2nd byte of the variable test. As a result, I would like obtain 1 because the result is 2 in base 10 (10 in base 2).

If the variable test now equal to 0b11000, I would like to obtain 0 because the result will be 0 in any base...

I tried to cast the result in BinaryType in order to have a representation of the operation in base 2 (for test = 0b1100010 it should be 10), convert this representation into a string one and extract the first char. But I got an exception while trying to cast to BinaryType.

EDIT :

I am using pyspark 2.3.0

SOLUTION :

from pyspark.sql import functions as F
from pyspark.sql.types import StringType, IntegerType
mask = 0b10
test = 0b1100010
df = spark.createDataFrame([(mask, test)], ["a", "b"])

df = df.withColumn("bitwise", df.a.bitwiseAND(df.b))
df = df.withColumn("bitwise_str", df.bitwise.cast(StringType()))
df = df.withColumn("binary", F.conv(df.bitwise_str, 10, 2))
df = df.withColumn("boolean_result", F.substring(df.binary.cast(StringType()), 0, 1).cast(IntegerType()))

df.collect()

CodePudding user response:

cast the c column into string and then move to base 2 using conv function

CodePudding user response:

You can use Spark's bit_get (index is 0-based, so the 2nd bit is referenced by index 1)

bit_get(a & b, 1)

Getting 2nd bit from binary operation a & b:

from pyspark.sql import functions as F
mask = 0b10
test = 0b1100010
df = spark.createDataFrame([(mask, test)], ["a", "b"])

df.select(F.expr("bit_get(a & b, 1)").alias("c")).collect()
# [Row(c=1)]

(expr is needed, because bit_get is not yet directly available in PySpark.)

It's even simpler to get the bit from already existent column: bit_get(b, 1)

  • Related