I'm working on an ML model that requires RSSI values from a BLE device. For this case, I've created a Mac application where I store a dictionary of type <K: Date,v: Int>
in a text file. See the sample below.
string = '[2021-10-17 06:52:00 0000: -47, 2021-10-17 06:52:04 0000: -50, 2021-10-17 06:52:03 0000: -50, 2021-10-17 06:52:02 0000: -47, 2021-10-17 06:52:08 0000: -46, 2021-10-17 06:51:57 0000: -50, 2021-10-17 06:52:09 0000: -48, 2021-10-17 06:52:05 0000: -49, 2021-10-17 06:52:01 0000: -48, 2021-10-17 06:51:58 0000: -50, 2021-10-17 06:51:59 0000: -50, 2021-10-17 06:52:06 0000: -47, 2021-10-17 06:52:07 0000: -48]'
Here, the negative values in the sample are the RSSI values. For example for the first 2 indices
Date | RSSI |
---|---|
2021-10-17 06:52:00 0000 | -47 |
2021-10-17 06:52:04 0000 | -50 |
For me to perform any calculations I need the data to be of type <Date, Int>
equivalent on python. How can I convert the above string into a Pandas Dataframe to perform calculations? Hope this provides enough information. Thank you in advance.
CodePudding user response:
You can use re.findall
and a small regex:
as dataframe
string = '[2021-10-17 06:52:00 0000: -47, 2021-10-17 06:52:04 0000: -50, 2021-10-17 06:52:03 0000: -50, 2021-10-17 06:52:02 0000: -47, 2021-10-17 06:52:08 0000: -46, 2021-10-17 06:51:57 0000: -50, 2021-10-17 06:52:09 0000: -48, 2021-10-17 06:52:05 0000: -49, 2021-10-17 06:52:01 0000: -48, 2021-10-17 06:51:58 0000: -50, 2021-10-17 06:51:59 0000: -50, 2021-10-17 06:52:06 0000: -47, 2021-10-17 06:52:07 0000: -48]'
import re
import pandas as pd
df = (pd.DataFrame.from_records(re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1]),
columns=['Date', 'RSSI'])
.astype({'Date': 'datetime64', 'RSSI': 'int'})
)
output:
Date RSSI
0 2021-10-17 06:52:00 0000 -47
1 2021-10-17 06:52:04 0000 -50
2 2021-10-17 06:52:03 0000 -50
3 2021-10-17 06:52:02 0000 -47
4 2021-10-17 06:52:08 0000 -46
5 2021-10-17 06:51:57 0000 -50
6 2021-10-17 06:52:09 0000 -48
7 2021-10-17 06:52:05 0000 -49
8 2021-10-17 06:52:01 0000 -48
9 2021-10-17 06:51:58 0000 -50
10 2021-10-17 06:51:59 0000 -50
11 2021-10-17 06:52:06 0000 -47
12 2021-10-17 06:52:07 0000 -48
as dictionary
import re
dict(re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1]))
output:
{'2021-10-17 06:52:00 0000': '-47',
'2021-10-17 06:52:04 0000': '-50',
'2021-10-17 06:52:03 0000': '-50',
'2021-10-17 06:52:02 0000': '-47',
'2021-10-17 06:52:08 0000': '-46',
'2021-10-17 06:51:57 0000': '-50',
'2021-10-17 06:52:09 0000': '-48',
'2021-10-17 06:52:05 0000': '-49',
'2021-10-17 06:52:01 0000': '-48',
'2021-10-17 06:51:58 0000': '-50',
'2021-10-17 06:51:59 0000': '-50',
'2021-10-17 06:52:06 0000': '-47',
'2021-10-17 06:52:07 0000': '-48'}
as dictionary with correct types:
import re
import pandas as pd
{pd.to_datetime(k): int(v)
for k,v in re.findall('([^,] ): (-?\d )(?:, )?', string[1:-1])}
output:
{Timestamp('2021-10-17 06:52:00 0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:04 0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:03 0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:02 0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:08 0000', tz='UTC'): -46,
Timestamp('2021-10-17 06:51:57 0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:09 0000', tz='UTC'): -48,
Timestamp('2021-10-17 06:52:05 0000', tz='UTC'): -49,
Timestamp('2021-10-17 06:52:01 0000', tz='UTC'): -48,
Timestamp('2021-10-17 06:51:58 0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:51:59 0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:06 0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:07 0000', tz='UTC'): -48}
CodePudding user response:
You can do this:
In [98]: l = string[1: -1].split(',')
In [140]: d = {i.split(': ')[0]: i.split(': ')[1] for i in l}
In [131]: df = pd.DataFrame(d.items(), columns=['Date', 'RSSI'])
In [132]: df
Out[132]:
Date RSSI
0 2021-10-17 06:52:00 0000 -47
1 2021-10-17 06:52:04 0000 -50
2 2021-10-17 06:52:03 0000 -50
3 2021-10-17 06:52:02 0000 -47
4 2021-10-17 06:52:08 0000 -46
5 2021-10-17 06:51:57 0000 -50
6 2021-10-17 06:52:09 0000 -48
7 2021-10-17 06:52:05 0000 -49
8 2021-10-17 06:52:01 0000 -48
9 2021-10-17 06:51:58 0000 -50
10 2021-10-17 06:51:59 0000 -50
11 2021-10-17 06:52:06 0000 -47
12 2021-10-17 06:52:07 0000 -48