Home > Software engineering >  Convert a string to a dictionary of type <K: date, V: int>
Convert a string to a dictionary of type <K: date, V: int>

Time:11-07

I'm working on an ML model that requires RSSI values from a BLE device. For this case, I've created a Mac application where I store a dictionary of type <K: Date,v: Int> in a text file. See the sample below.

string = '[2021-10-17 06:52:00  0000: -47, 2021-10-17 06:52:04  0000: -50, 2021-10-17 06:52:03  0000: -50, 2021-10-17 06:52:02  0000: -47, 2021-10-17 06:52:08  0000: -46, 2021-10-17 06:51:57  0000: -50, 2021-10-17 06:52:09  0000: -48, 2021-10-17 06:52:05  0000: -49, 2021-10-17 06:52:01  0000: -48, 2021-10-17 06:51:58  0000: -50, 2021-10-17 06:51:59  0000: -50, 2021-10-17 06:52:06  0000: -47, 2021-10-17 06:52:07  0000: -48]'

Here, the negative values in the sample are the RSSI values. For example for the first 2 indices

Date RSSI
2021-10-17 06:52:00 0000 -47
2021-10-17 06:52:04 0000 -50

For me to perform any calculations I need the data to be of type <Date, Int> equivalent on python. How can I convert the above string into a Pandas Dataframe to perform calculations? Hope this provides enough information. Thank you in advance.

CodePudding user response:

You can use re.findall and a small regex:

as dataframe
string = '[2021-10-17 06:52:00  0000: -47, 2021-10-17 06:52:04  0000: -50, 2021-10-17 06:52:03  0000: -50, 2021-10-17 06:52:02  0000: -47, 2021-10-17 06:52:08  0000: -46, 2021-10-17 06:51:57  0000: -50, 2021-10-17 06:52:09  0000: -48, 2021-10-17 06:52:05  0000: -49, 2021-10-17 06:52:01  0000: -48, 2021-10-17 06:51:58  0000: -50, 2021-10-17 06:51:59  0000: -50, 2021-10-17 06:52:06  0000: -47, 2021-10-17 06:52:07  0000: -48]'

import re
import pandas as pd

df = (pd.DataFrame.from_records(re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1]),
                                columns=['Date', 'RSSI'])
        .astype({'Date': 'datetime64', 'RSSI': 'int'})
      )

output:

                         Date RSSI
0   2021-10-17 06:52:00  0000  -47
1   2021-10-17 06:52:04  0000  -50
2   2021-10-17 06:52:03  0000  -50
3   2021-10-17 06:52:02  0000  -47
4   2021-10-17 06:52:08  0000  -46
5   2021-10-17 06:51:57  0000  -50
6   2021-10-17 06:52:09  0000  -48
7   2021-10-17 06:52:05  0000  -49
8   2021-10-17 06:52:01  0000  -48
9   2021-10-17 06:51:58  0000  -50
10  2021-10-17 06:51:59  0000  -50
11  2021-10-17 06:52:06  0000  -47
12  2021-10-17 06:52:07  0000  -48

as dictionary

import re
dict(re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1]))

output:

{'2021-10-17 06:52:00  0000': '-47',
 '2021-10-17 06:52:04  0000': '-50',
 '2021-10-17 06:52:03  0000': '-50',
 '2021-10-17 06:52:02  0000': '-47',
 '2021-10-17 06:52:08  0000': '-46',
 '2021-10-17 06:51:57  0000': '-50',
 '2021-10-17 06:52:09  0000': '-48',
 '2021-10-17 06:52:05  0000': '-49',
 '2021-10-17 06:52:01  0000': '-48',
 '2021-10-17 06:51:58  0000': '-50',
 '2021-10-17 06:51:59  0000': '-50',
 '2021-10-17 06:52:06  0000': '-47',
 '2021-10-17 06:52:07  0000': '-48'}

as dictionary with correct types:

import re
import pandas as pd
{pd.to_datetime(k): int(v)
 for k,v in re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1])}

output:

{Timestamp('2021-10-17 06:52:00 0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:04 0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:03 0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:02 0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:08 0000', tz='UTC'): -46,
 Timestamp('2021-10-17 06:51:57 0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:09 0000', tz='UTC'): -48,
 Timestamp('2021-10-17 06:52:05 0000', tz='UTC'): -49,
 Timestamp('2021-10-17 06:52:01 0000', tz='UTC'): -48,
 Timestamp('2021-10-17 06:51:58 0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:51:59 0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:06 0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:07 0000', tz='UTC'): -48}

CodePudding user response:

You can do this:

In [98]: l = string[1: -1].split(',')
In [140]: d = {i.split(': ')[0]: i.split(': ')[1] for i in l}

In [131]: df = pd.DataFrame(d.items(), columns=['Date', 'RSSI'])
In [132]: df
Out[132]: 
                          Date RSSI
0    2021-10-17 06:52:00  0000  -47
1    2021-10-17 06:52:04  0000  -50
2    2021-10-17 06:52:03  0000  -50
3    2021-10-17 06:52:02  0000  -47
4    2021-10-17 06:52:08  0000  -46
5    2021-10-17 06:51:57  0000  -50
6    2021-10-17 06:52:09  0000  -48
7    2021-10-17 06:52:05  0000  -49
8    2021-10-17 06:52:01  0000  -48
9    2021-10-17 06:51:58  0000  -50
10   2021-10-17 06:51:59  0000  -50
11   2021-10-17 06:52:06  0000  -47
12   2021-10-17 06:52:07  0000  -48
  • Related