Home > database >  Split data based on columns in a file to arrays python - best practices
Split data based on columns in a file to arrays python - best practices

Time:11-09

What is the best way to read data from txt/csv file, separate values based on columns to arrays (no matter how many columns there are) and how skip for example first row if file looks like this:

Link to screen: enter image description here

Considering existing libraries in python.

So far, I've done it this way:

pareto_front_file = open("Pareto Front.txt")
data_pareto_front = pareto_front_file.readlines()
for pareto_front_row in data_pareto_front:
    x_pareto.append(float(pareto_front_row.split('  ')[0]))
    y_pareto.append(float(pareto_front_row.split('  ')[1]))

but creating more complicated things I see that this way is not very effective

CodePudding user response:

Use the "Pandas" library (or something similar)

For tabular data, one of the most popular libraries is Pandas. Not only will this allow you to read the data easily, there are also methods for nearly all types of data transformation, filtering, visualization, etc. you can imagine. Pandas is one of the most popular python packages, and although it may seem daunting at first, it is usually a lot easier than re-inventing the wheel yourself. In case you are familiar with the R language, pandas covers a lot of the tidyverse functionalities and the DataFrame is similar to R's data.frame.

Transfer your data into a python object

It offers a read_csv method where you can specify a custom delimiter if you need it. Your data will come as a pandas "DataFrame", a datatype specifically designed for the kind of data you describe. Among many other things, it will recognize column names such as the ones you have in your data automatically.

Example for csv:

df = pd.read_csv('file.csv', delimiter=',')  # choose delimiter you need
print(df.head())  # show first 5 rows
print(df.summary())  # get an overview over your data
  • Related