I've a data file with 4 visible columns that I'm trying to split using pandas. I'm getting ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8
This is my data
0.001155672 259,439 branch-instructions
0.001155672 1,266,239 instructions # 1.10 insn per cycle
0.001155672 24,148 cache-references
0.001155672 11,586 cache-misses # 47.979 % of all cache refs
0.001155672 1,150,999 cpu-cycles
0.001155672 8,888 branch-misses # 3.43% of all branches
0.002370509 381,074 branch-instructions
0.002370509 1,908,560 instructions # 1.12 insn per cycle
0.002370509 29,034 cache-references
0.002370509 15,362 cache-misses # 52.910 % of all cache refs
I've tried using data = pd.read_table('repg.txt',sep='\s ', header=None, thousands=',')
and delim_whitespace=True
CodePudding user response:
Just add comment='#'
as a parameter of pd.read_table
:
data = pd.read_table('repg.txt',sep='\s ', header=None, thousands=',', comment='#')
print(data)
# Output
0 1 2
0 0.001156 259439 branch-instructions
1 0.001156 1266239 instructions
2 0.001156 24148 cache-references
3 0.001156 11586 cache-misses
4 0.001156 1150999 cpu-cycles
5 0.001156 8888 branch-misses
6 0.002371 381074 branch-instructions
7 0.002371 1908560 instructions
8 0.002371 29034 cache-references
9 0.002371 15362 cache-misses