In database, I have emails of the users which are encrypted in the backend using Java Jasypt library with default configuration. From what I understand, it uses PBEWITHMD5andDES and 1000 iterations to generate the key.
@Bean
StringEncryptor encryptor() {
StandardPBEStringEncryptor spbe = new StandardPBEStringEncryptor();
spbe.setPassword(symmetricKey);
return spbe;
}
Then encryptor is used as follows:
@Before("insertAccount(account)")
public void beforeInsertAccount(Account account) {
account.setFirstName(encryptor.encrypt(StringUtils.defaultString(account.getFirstName())));
account.setPhone(encryptor.encrypt(StringUtils.defaultString(account.getPhone())));
account.setEmail(encryptor.encrypt(StringUtils.defaultString(account.getEmail())));
}
In AWS Glue (ETL tool) I need to decrypt the emails of the users, or at least match encrypted emails from two different tables. I have a possibility to define custom transformation by creating Python3 script.
I wrote a script based on that: https://gist.github.com/jpralves/505e653fd1c7358ad2c540e25e1ee80a It's using pycryptodome library. data_to_decrypt variable is initialized with jasypt-encrypted '[email protected]' using password 'test'.
def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
from Crypto.Hash import MD5
from Crypto.Cipher import DES
import base64
import sys
from pyspark.sql.functions import lit
newdf = dfc.select(list(dfc.keys())[0]).toDF()
data_to_decrypt = base64.b64decode("epncHsHYRZd8uIWncULit//8f0mhk8pn")
password = "test"
bs = 8
_iterations = 1000
salt = data_to_decrypt[:bs]
data = data_to_decrypt[bs:]
hasher = MD5.new()
result = hasher.digest()
hasher.update(bytearray(password.encode()))
hasher.update(bytearray(salt))
for i in range(1, _iterations):
hasher = MD5.new()
hasher.update(result)
result = hasher.digest()
encoder = DES.new(result[:bs], DES.MODE_CBC, result[bs:bs*2])
decrypted = encoder.decrypt(bytes(data))
length = len(decrypted)
unpadding = int(decrypted[length-1])
decryptedEmail = ''
if length - unpadding > 0:
decryptedEmail = decrypted[:(length - unpadding)].decode("latin")
else:
decryptedEmail = decrypted.decode("latin")
newdf = newdf.withColumn('decryptedEmail', lit(decryptedEmail))
dyf_filtered = DynamicFrame.fromDF(newdf, glueContext, "aaa")
return(DynamicFrameCollection({"CustomTransform0": dyf_filtered}, glueContext))
Script is outputting some nonsense like "’€ ‡>— ¦õ8Þûð›7e". When I tried to decode output string from encoder in any other encoding it failed.
CodePudding user response:
Your actual problem is that your first hash is wrong; you need to take .digest
after doing the two .update
s. (Your iterated hashes are correct.) In addition your unpadding is poor: PKCS5 padding should not exceed one block which for DES is 8 bytes. Even better would be to check all the padding bytes if more than 1, but I didn't bother.
$ cat 71576901.py3
from Crypto.Hash import MD5
from Crypto.Cipher import DES
import base64
data_to_decrypt = base64.b64decode("epncHsHYRZd8uIWncULit//8f0mhk8pn")
password = "test"
bs = 8
_iterations = 1000
salt = data_to_decrypt[:bs]
data = data_to_decrypt[bs:]
hasher = MD5.new()
hasher.update(bytearray(password.encode()))
hasher.update(bytearray(salt))
result = hasher.digest() # moved down
for i in range(1, _iterations):
hasher = MD5.new()
hasher.update(result)
result = hasher.digest()
encoder = DES.new(result[:bs], DES.MODE_CBC, result[bs:bs*2])
decrypted = encoder.decrypt(bytes(data))
length = len(decrypted)
unpadding = int(decrypted[length-1])
if unpadding > 0 and unpadding <= bs: # better check
print (decrypted[:-unpadding].decode('latin1')) # or other decoding depending on what you encrypted
else:
print ('bad') # might better raise, but TBD
$ python3 71576901.py3
[email protected]
And just be sure you know, this is a very weak encryption and easily broken -- but that's a security issue, not programming, and offtopic here.