I have a file with 7 aligned columns, with empty cells.Example:
SN 1995ap 0.230 40.44 0.46 0.00 silver
SN 1995ao 0.300 40.76 0.60 0.00 silver
SN 1995ae 0.067 37.54 0.34 0.00 silver
SN 1995az 0.450 42.13 0.21 gold
SN 1995ay 0.480 42.37 0.20 gold
SN 1995ax 0.615 42.85 0.23 gold
I want to read it using pandas.read_csv()
, but I have some trouble. The separator can be either 1 or 2 spaces. If I use sep='\s '
it works, but it ignores empty cells, therefore I get cells shifted to the left and empty cells in the last columns. I tried to use regex separator sep=\s{1,2}
, but i get the following error:
pandas.errors.ParserError: Expected 7 fields in line 63, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
My code:
import pandas as pd
riess_2004b=pd.read_csv('Riess_2004b.txt', skiprows=22, header=None, sep='\s{1,2}', engine='python')
What I am not getting right?
CodePudding user response:
If there is no extra spaces in your field value and no continuous empty values in one row, you can try delim_whitespace
argument and then shift the NAN part to left by one column.
df = pd.read_csv('xx', delim_whitespace=True)
def shift(col):
m = col.isna().shift(-1, fill_value=False)
col = col.fillna(method='ffill')
col[m] = pd.NA
return col
df = df.T.apply(shift, axis=0).T
print(df)
SN 1995ap 0.230 40.44 0.46 0.00 silver
0 SN 1995ao 0.3 40.76 0.6 0.00 silver
1 SN 1995ae 0.067 37.54 0.34 0.00 silver
2 SN 1995az 0.45 42.13 0.21 <NA> gold
3 SN 1995ay 0.48 42.37 0.2 <NA> gold
4 SN 1995ax 0.615 42.85 0.23 <NA> gold
CodePudding user response:
Fix-width file (read_fwf
) seems like a better fit for your case:
df = pd.read_fwf("Riess_2004b.txt", colspecs="infer", header=None)