Decoding a column and create new dataframe with the new data-CodePudding

I have a dataframe with the following structure:

Timestamp         Payload                                                                                                     
2022-03-16 18:00  3c313034393536353534303b31332e32333534313b303b302e303637313b302e303138313b332e33353b36322e31353338323e
2022-03-16 18:10  3c313034393536363134303b31332e32313134383b303b302e303637323b302e3031383b342e3632353b37332e34353133313e
2022-03-16 18:20  3c313034393536363734303b31332e31393630313b303b302e303637323b302e3031383b342e303132353b36322e34343238353e     
2022-03-16 18:30  3c313034393537353734303b31332e31303137343b303b302e3036373b302e303137393b342e383132353b37352e343931393e     
2022-03-16 18:40  3c313034393538323934303b31332e30353934353b303b302e303636373b302e303137393b332e313337353b37352e37313830393e

What I would like is to create a new dataframe where those two fields appear, and also the decoding of the payload column. The difficulty I have is that the decoding of that set of characters generates a set of data. For example if we decode the following line,

3c313034393536353534303b31332e32333534313b303b302e303637313b302e303138313b332e33353b36322e31353338323e

we obtain,

<1049565540;13.23541;0;0.0671;0.0181;3.35;62.15382>

Where the characters <, > and ; are separators. So my goal would be to obtain something similar to the following dataframe,

2022-03-16 18:00  1049565540  13.23541  0  0.0671  0.0181  3.3500  62.15382  3c313034393536353534303b31332e32333534313b303b302e303637313b302e303138313b332e33353b36322e31353338323e
2022-03-16 18:10  1049566140  13.21148  0  0.0672  0.0180  4.6250  73.45131  3c313034393536363134303b31332e32313134383b303b302e303637323b302e3031383b342e3632353b37332e34353133313e
2022-03-16 18:20  1049566740  13.19601  0  0.0672  0.0180  4.0125  62.44285  3c313034393536363734303b31332e31393630313b303b302e303637323b302e3031383b342e303132353b36322e34343238353e     
2022-03-16 18:30  1049575740  13.10174  0  0.0670  0.0179  4.8125  75.49190  3c313034393537353734303b31332e31303137343b303b302e3036373b302e303137393b342e383132353b37352e343931393e     
2022-03-16 18:40  1049582940  13.05945  0  0.0667  0.0179  3.1375  75.71809  3c313034393538323934303b31332e30353934353b303b302e303636373b302e303137393b332e313337353b37352e37313830393e

Actually I'm doing the following code, it's ok but I'm sure I'm not taking advantage of the potential of the Pandas library.

def myFunc(pay):
    decode = bytearray.fromhex(pay[2:-2]).decode()
    return decode.split(";")
  
output2['datos'] = output2['payload'].apply(myFunc)
payloads = output2['datos'].values
df = pd.DataFrame(payloads.tolist(), columns=None)

Can it be done in a more efficient way?

Thank you very much!

CodePudding user response：

If want working by special functions is possible use .apply like your solution or list comprehension:

L = [bytearray.fromhex(pay[2:-2]).decode().split(";") for pay in output2.pop('Payload')]
df = pd.DataFrame(L, index=output2.index)

Last use join:

df = output2.join(df)
print (df)
          Timestamp           0         1  2       3       4       5         6
0  2022-03-16 18:10  1049566140  13.21148  0  0.0672   0.018   4.625  73.45131
1  2022-03-16 18:20  1049566740  13.19601  0  0.0672   0.018  4.0125  62.44285
2  2022-03-16 18:30  1049575740  13.10174  0   0.067  0.0179  4.8125   75.4919
3  2022-03-16 18:40  1049582940  13.05945  0  0.0667  0.0179  3.1375  75.71809

If need also original column:

L = [bytearray.fromhex(pay[2:-2]).decode().split(";") for pay in output2['Payload']]
df = pd.DataFrame(L, index=output2.index)
df = output2.join(df)
print (df)
          Timestamp                                            Payload  \
0  2022-03-16 18:10  3c313034393536363134303b31332e32313134383b303b...   
1  2022-03-16 18:20  3c313034393536363734303b31332e31393630313b303b...   
2  2022-03-16 18:30  3c313034393537353734303b31332e31303137343b303b...   
3  2022-03-16 18:40  3c313034393538323934303b31332e30353934353b303b...   

            0         1  2       3       4       5         6  
0  1049566140  13.21148  0  0.0672   0.018   4.625  73.45131  
1  1049566740  13.19601  0  0.0672   0.018  4.0125  62.44285  
2  1049575740  13.10174  0   0.067  0.0179  4.8125   75.4919  
3  1049582940  13.05945  0  0.0667  0.0179  3.1375  75.71809

CodePudding user response：

According to your example, separated with blanks.

def myFunc(pay):
    arr = []
    for val in pay:
        pay[val] = bytearray.fromhex(val[2:-2]).decode().split(";")
        arr.append(" ".join(pay[val]))
    return arr

df["Payload"] = myFunc(df["Payload"])
df

    Timestamp           Payload
0   2022-03-16 18:00    1049565540 13.23541 0 0.0671 0.0181 3.35 62.15382
1   2022-03-16 18:10    1049566140 13.21148 0 0.0672 0.018 4.625 73.45131
2   2022-03-16 18:20    1049566740 13.19601 0 0.0672 0.018 4.0125 62.44285
3   2022-03-16 18:30    1049575740 13.10174 0 0.067 0.0179 4.8125 75.4919
4   2022-03-16 18:40    1049582940 13.05945 0 0.0667 0.0179 3.1375 75.71809