The input is given as:
rec = [b'1674278797,14.33681', b'1674278798,6.03617', b'1674278799,12.78418']
I want to get a DataFrame like:
df
timestamp val
0 1674278797 14.33681
1 1674278798 6.03617
2 1674278799 12.78418
What is the most efficient way? Thanks!
If I can convert rec like
[[1674278797,14.33681], [1674278798,6.03617], [1674278799,12.78418]]
It would be easy for me by calling
df = pd.DataFrame(rec, columns=['timestamp','val'])
But I don't know how to do the conversion quickly.
btw, I got rec from a Redis list. I can modify the format of each element (for example, b'1674278797,14.33681' is an element) if necessory.
CodePudding user response:
If you can't directly handle the original input, you can use:
(pd.Series([x.decode('utf-8') for x in rec])
.str.split(',', expand=True).convert_dtypes()
.set_axis(['timestamp', 'val'], axis=1)
)
Or:
import io
pd.read_csv(io.StringIO('\n'.join([x.decode('utf-8') for x in rec])),
header=None, names=['timestamp', 'val'])
Output:
timestamp val
0 1674278797 14.33681
1 1674278798 6.03617
2 1674278799 12.78418
CodePudding user response:
You can do this in one line:
pd.DataFrame([x.decode().split(",") for x in rec], columns=["timestamp","val"])
Returns
timestamp val
0 1674278797 14.33681
1 1674278798 6.03617
2 1674278799 12.78418
If you want to convert the datatypes of the column you can add .astype({"timestamp": "int64", "val": "float64"})
to the end of the line.