How do I convert a .txt file with the following form of data to a pandas dataframe?
For example, this is the txt file with the structure (x1, y1, z1), (x2, y2, z2), ... (xn, yn, zn),
(108.222994365147, 16.077177357808345, 17.5), (108.22299074891866, 16.07718225312858, 17.5), (108.2229869226013, 16.077186986051835, 17.5), (108.22298289347849, 16.077191547568788, 17.5),
And after converting I want it to be like this
x y z
1 108.222994365147 16.077177357808345 17.5
2 108.22299074891866 16.07718225312858 17.5
3 108.2229869226013 16.077186986051835 17.5
4 108.22298289347849 16.077191547568788 17.5
CodePudding user response:
This approach would solve your problem
import pandas as pd
import re
with open({YOUR_FILE_LOCATION}, "r") as f:
s = f.read()
pattern = re.compile("\(([\d\.] ),[ ]*([\d\.] ),[ ]*([\d \.] )\)")
pd.DataFrame(pattern.findall(s), columns=["x","y","z"]).astype(float)
OUTPUT
x y z
0 108.222994 16.077177 17.5
1 108.222991 16.077182 17.5
2 108.222987 16.077187 17.5
3 108.222983 16.077192 17.5
Once the file is imported, all the patterns of interest (3 comma separated floats between brackets) are matched and passed to a DataFrame
constructor as a list of lists. Then everything is cast to float
.
CodePudding user response:
data = pd.read_csv('file1.txt', sep=" ", header=None)
data.columns = ["x", "y", "z"]
try this
CodePudding user response:
An alternative solution using io.StringIO
and pd.read_csv
:
import pandas as pd
from io import StringIO
with open({YOUR_FILE_LOCATION}, "r") as file:
data = file.read()
data = data.replace('(', '').replace('),', '\n')[:-1]
df = pd.read_csv(StringIO(data), header=None)
df.columns = ["x", "y", "z"]
Output:
x y z
0 108.222994 16.077177 17.5
1 108.222991 16.077182 17.5
2 108.222987 16.077187 17.5
3 108.222983 16.077192 17.5