I am trying to count the number of columns in external files. Here is an example of a file, data.dat
. Please note that it is not a CSV file. The whitespace is made up of spaces. Each file may have a different number of spaces between the columns.
Data Z-2 C 2
m_[a/b] -155555.0 -133333.0
n_[a/b] -188800.0 -133333.0
o_[a/b*Y] -13.5 -17.95
p1_[cal/(a*c)] -0.01947 0.27
p2_[a/b] -700.2 -200.44
p3_(a*Y)/(b*c) 5.2966 6.0000
p4_[(a*Y)/b] -22222.0 -99999.0
q1_[b/(b*Y)] 9.0 -6.3206
q2_[c] -25220.0 -171917.0
r_[a/b] 1760.0 559140
s 4.0 -4.0
I experimented with split(" ")
but could not figure out how to get it to recognize multiple whitespaces; it counted each whitespace as a separate column.
This seems promising but my attempt only counts the first column. It may seem silly to attempt a CSV method to deal with a non-CSV file. Maybe this is where my problems are coming from. However, I have used CSV methods before to deal with text files.
For example, I import my data:
with open(data) as csvfile:
reader = csv.DictReader(csvfile)
n_cols = len(reader.fieldnames)
When I use this, only the first column is recognized. The code is too long to post but I know this is happening because when manually enter n_cols = 3
, I do get the results I expect.
It does work if I use commas to delimit the columns, but I can't do that (I need to use whitespace).
Does anyone know an alternative method that deals with arbitrary whitespace and non-CSV files? Thank you for any advice.
CodePudding user response:
Yes, an alternative method you can do it with pandas
library:
import pandas as pd
df = pd.read_csv('data.dat', delim_whitespace=True)