I have huge Sensor log data which is in form of [key=value] pair I need to parse the data column wise i found this code for my problem
import pandas as pd
lines = []
with open('/path/to/test.txt', 'r') as infile:
for line in infile:
if "," not in line:
continue
else:
lines.append(line.strip().split(","))
row_names = []
column_data = {}
max_length = max(*[len(line) for line in lines])
for line in lines:
while(len(line) < max_length):
line.append(f'{len(line)-1}=NaN')
for line in lines:
row_names.append(" ".join(line[:2]))
for info in line[2:]:
(k,v) = info.split("=")
if k in column_data:
column_data[k].append(v)
else:
column_data[k] = [v]
df = pd.DataFrame(column_data)
df.index = row_names
print(df)
df.to_csv('/path/to/test.csv')
the above code is suitable when the data is in form of "Priority=0, X=776517049" but my data is something like this [Priority=0][X=776517049] and there is no separator in between two columns how can i do it in python and i am sharing the link of sample data here raw data and bilow that expected parsed data which i done manually https://docs.google.com/spreadsheets/d/1EVTVL8RAkrSHhZO48xV1uEGqOzChQVf4xt7mHkTcqzs/edit?usp=sharing kindly check this link
CodePudding user response:
I've downloaded as csv.
Since your file has multiple tables on one sheet, I've limited to 100 rows, you can remove that parameter.
raw = pd.read_csv(
"logdata - Sheet1.csv", # filename
skiprows=1, # skip the first row
nrows=100, # use 100 rows, remove in your example
usecols=[0], # only use the first column
header=None # your dataset has no column names
)
Then you can use a regex to extract the values:
df = raw[0].str.extract(r"\[Priority=(\d*)\] \[GPS element=\[X=(\d*)\] \[Y=(\d*)\] \[Speed=(\d*)\]")
and set column names:
df.columns = ["Priority", "X", "Y", "Speed"]
result:
Priority X Y Speed
0 0 776517049 128887449 4
1 0 776516816 128887733 0
2 0 776516816 128887733 0
3 0 776516833 128887166 0
4 0 776517200 128886133 0
5 0 776516883 128885933 8
.....................................
99 0 776494483 128908783 0