Pandas read_csv Multiple spaces delimiter-CodePudding

I have a file with 7 aligned columns, with empty cells.Example:

SN 1995ap 0.230  40.44 0.46  0.00 silver 
SN 1995ao 0.300  40.76 0.60  0.00 silver 
SN 1995ae 0.067  37.54 0.34  0.00 silver 
SN 1995az 0.450  42.13 0.21       gold   
SN 1995ay 0.480  42.37 0.20       gold   
SN 1995ax 0.615  42.85 0.23       gold

I want to read it using pandas.read_csv(), but I have some trouble. The separator can be either 1 or 2 spaces. If I use sep='\s ' it works, but it ignores empty cells, therefore I get cells shifted to the left and empty cells in the last columns. I tried to use regex separator sep=\s{1,2}, but i get the following error:

pandas.errors.ParserError: Expected 7 fields in line 63, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

My code:

import pandas as pd

riess_2004b=pd.read_csv('Riess_2004b.txt', skiprows=22, header=None, sep='\s{1,2}', engine='python')

What I am not getting right?

CodePudding user response：

If there is no extra spaces in your field value and no continuous empty values in one row, you can try delim_whitespace argument and then shift the NAN part to left by one column.

df = pd.read_csv('xx', delim_whitespace=True)


def shift(col):
    m = col.isna().shift(-1, fill_value=False)
    col = col.fillna(method='ffill')
    col[m] = pd.NA
    return col

df = df.T.apply(shift, axis=0).T

print(df)

   SN  1995ap  0.230  40.44  0.46  0.00  silver
0  SN  1995ao    0.3  40.76   0.6  0.00  silver
1  SN  1995ae  0.067  37.54  0.34  0.00  silver
2  SN  1995az   0.45  42.13  0.21  <NA>    gold
3  SN  1995ay   0.48  42.37   0.2  <NA>    gold
4  SN  1995ax  0.615  42.85  0.23  <NA>    gold

CodePudding user response：

Fix-width file (read_fwf) seems like a better fit for your case:

df = pd.read_fwf("Riess_2004b.txt", colspecs="infer", header=None)