How to split data present in a row (separated by space) into different column in the same excel shee-CodePudding

I have a file containing the CAN data in format:

(000.000000)  can0  1FF   [7]  20 08 87 0E 1D 98 0E

This data is present in a single cell. How can I split this data into separate columns (eg: 000.000000 in one column, can0 in one column, 1FF in separate column etc.) using a python script.

CodePudding user response：

You need to split your lines of data by some number of space characters. The input you shared in your original post is separated by spaces (I don't see tabs, or other whitespace chars).

We can visualize the spaces in your data by replacing them with bullet points:

with open("input.txt") as f_in:
    for line in f_in:
        line = line.strip()
        print(line.replace(" ", "•"))

and now we can clearly see how many spaces are between your cells of data (I also mocked-up another line for clarity):

(000.000000)••can0••1FF•••[7]••20•08•87•0E•1D•98•0E
(000.000000)••can1••2DF•••[7]••90•0A•DE•3E•AA•B8•99

If you want every bit of data in its own cell/column; you can use this regular expression, r" {1,}", to split on one or more (any number) of space characters:

re.split(r" {1,}", "foo  bar   baz baker")  # ["foo", "bar", "baz", "baker"]

Open your input file, and iterate over the lines, splitting each line and storing the resulant row in a list of rows:

rows = []
with open("input.txt") as f_in:
    for line in f_in:
        line = line.strip()  # get rid of trailing line break character(s)
        row = re.split(r" {1,}", line)
        rows.append(row)

and rows looks like this:

[
 ['(000.000000)', 'can0', '1FF', '[7]', '20', '08', '87', '0E', '1D', '98', '0E'],
 ['(000.000000)', 'can1', '2DF', '[7]', '90', '0A', 'DE', '3E', 'AA', 'B8', '99']
]

If you want the last set of hex values to remain as one cell of data, then change the regular expression to r" {2,}" to split on two or more space characters. This will preserve the single space between the indivdual hex values, leaving all the hex values in one column:

rows = []
with open("input.txt") as f_in:
    for line in f_in:
        line = line.strip()
        row = re.split(r" {2,}", line)
        rows.append(row)

and rows looks like this:

[
 ['(000.000000)', 'can0', '1FF', '[7]', '20 08 87 0E 1D 98 0E'],
 ['(000.000000)', 'can1', '2DF', '[7]', '90 0A DE 3E AA B8 99']
]

However you need to split up the line, you now have a set of rows that you can write back out to a CSV:

import csv

with open("output.csv", "w", newline="") as f_out:
    writer = csv.writer(f_out)
    writer.writerow(["Col1", "Col2", "Col3", "Col4", "Col5"])  # write a single row (a list of strings)
    writer.writerows(rows)                                     # write multiple rows (list of list of strings)

And you'll get a CSV that looks something like this:

Col1	Col2	Col3	Col4	Col5
(000.000000)	can0	1FF	[7]	20 08 87 0E 1D 98 0E
(000.000000)	can1	2DF	[7]	90 0A DE 3E AA B8 99

CodePudding user response：

you can split the data by space and use pandas to get columns.

import pandas as pd

text = '(000.000000)  can0  1FF   [7]  20 08 87 0E 1D 98 0E'
data = text.split(' ')
res = {k: [''] for k in data if k}
df = pd.DataFrame(res)
print(df)
>>>>   (000.000000) can0 1FF [7] 20 08 87 0E 1D 98
     0