Consider the following code:
>> from pyspark.sql import Row, functions
>> mask = 0b10
>> test = 0b1100010
>> df = spark.createDataFrame([Row(a=mask, b=test)])
>> df.withColumn("c", df.a.bitwiseAND(df.b)).select(functions.col("c")).collect()
[Row(c=2)]
I would like to adapt this code to perform a binary operation to extract the value of the 2nd byte of the variable test
. As a result, I would like obtain 1 because the result is 2 in base 10 (10 in base 2).
If the variable test
now equal to 0b11000
, I would like to obtain 0 because the result will be 0 in any base...
I tried to cast
the result in BinaryType
in order to have a representation of the operation in base 2 (for test = 0b1100010
it should be 10
), convert this representation into a string one and extract the first char. But I got an exception while trying to cast
to BinaryType
.
EDIT :
I am using pyspark 2.3.0
SOLUTION :
from pyspark.sql import functions as F
from pyspark.sql.types import StringType, IntegerType
mask = 0b10
test = 0b1100010
df = spark.createDataFrame([(mask, test)], ["a", "b"])
df = df.withColumn("bitwise", df.a.bitwiseAND(df.b))
df = df.withColumn("bitwise_str", df.bitwise.cast(StringType()))
df = df.withColumn("binary", F.conv(df.bitwise_str, 10, 2))
df = df.withColumn("boolean_result", F.substring(df.binary.cast(StringType()), 0, 1).cast(IntegerType()))
df.collect()
CodePudding user response:
cast the c column into string and then move to base 2 using conv function
CodePudding user response:
You can use Spark's bit_get
(index is 0-based, so the 2nd bit is referenced by index 1)
bit_get(a & b, 1)
Getting 2nd bit from binary operation a & b
:
from pyspark.sql import functions as F
mask = 0b10
test = 0b1100010
df = spark.createDataFrame([(mask, test)], ["a", "b"])
df.select(F.expr("bit_get(a & b, 1)").alias("c")).collect()
# [Row(c=1)]
(expr
is needed, because bit_get
is not yet directly available in PySpark.)
It's even simpler to get the bit from already existent column: bit_get(b, 1)