Home > Enterprise >  How can I open a text file with multiple delimiters in pandas?
How can I open a text file with multiple delimiters in pandas?

Time:10-10

I have a text file which I am trying to open in a pandas dataframe however I am not quite sure how to parse the data in such a way that it splits at all of these values.

file is attached here: http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat

The readme file that explains the data is attached here: http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/ReadMe

As of now i am using:

pd.read_csv(http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat)

Once putting the data into pandas I want to also add custom headings.

Any help would be greatly appreciated.

CodePudding user response:

You can use pd.read_fwf. Extract useful specs from your readme file:

colspecs = [(0, 12), (12, 21), (21, 30), (30, 31), (31, 41), (41, 50), (50, 60),
            (60, 69), (69, 79), (79, 88), (88, 98), (98, 108), (108, 121),
            (121, 130), (130, 140), (140, 150), (150, 160), (160, 161), (161, 172),
            (172, 183), (183, 193)]

colnames = ['Name','zcmb', 'hel', 'e_z', 'mb', 'e_mb', 'x1', 'e_x1', 'c', 'e_c',
            'logMst', 'e_logMst', 'tmax', 'e_tmax', 'cov(mbs)', 'cov(mb,c)',
            'cov(s,c)', 'set', 'RAdeg', 'DEdeg', 'bias']

df = pd.read_fwf('http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat',
                 colspecs=colspecs, names=colnames)

Output:

>>> df
Name      zcmb       hel  e_z         mb      e_mb        x1      e_x1         c       e_c  ...  e_logMst          tmax    e_tmax  cov(mbs)  cov(mb,c)  cov(s,c)  set       RAdeg      DEdeg      bias
0      03D1au  0.503084  0.504300    0  23.001698  0.088031  1.273191  0.150058 -0.012353  0.030011  ...  0.110500  52909.745220  0.214332  0.000790   0.000440 -0.000030    1   36.043210  -4.037469  0.001697
1      03D1aw  0.580724  0.582000    0  23.573937  0.090132  0.974346  0.273823 -0.025076  0.036691  ...  0.088000  52902.898002  0.352732  0.002823   0.000415  0.001574    1   36.061634  -4.517158  0.000843
2      03D1ax  0.494795  0.496000    0  22.960139  0.088110 -0.728837  0.102069 -0.099683  0.030305  ...  0.112500  52915.923670  0.111634  0.000542   0.000475 -0.000024    1   36.097287  -4.720774  0.001692
3      03D1bp  0.345928  0.347000    0  22.398137  0.087263 -1.155110  0.112834 -0.040581  0.026679  ...  0.123500  52920.249015  0.102828  0.001114   0.000616  0.000295    1   36.657235  -4.838779 -0.000270
4      03D1co  0.677662  0.679000    0  24.078115  0.098356  0.618820  0.404295 -0.039380  0.067403  ...  0.284000  52954.458342  0.454715  0.011857   0.000780  0.005898    1   36.567748  -4.935050 -0.002855
..        ...       ...       ...  ...        ...       ...       ...       ...       ...       ...  ...       ...           ...       ...       ...        ...       ...  ...         ...        ...       ...
735  sn2007co  0.027064  0.026962    0  16.504006  0.141685 -0.137806  0.061153  0.105288  0.020382  ...  0.280891  54265.212054  0.056635  0.000095   0.000377  0.000007    3  275.765000  29.897050 -0.009803
736  sn2007cq  0.025468  0.025918    0  15.797848  0.143429 -0.657941  0.115645 -0.060805  0.025820  ...  0.280891  54281.025669  0.070944  0.000392   0.000639  0.000075    3  333.668430   5.080160 -0.009575
737   sn2007f  0.023810  0.023590    0  15.895501  0.144315  0.618766  0.041400 -0.055411  0.026006  ...  0.118500  54124.058397  0.045234 -0.000055   0.000645 -0.000180    3  195.812750  50.618760 -0.009361
738  sn2007qe  0.023867  0.024000    0  16.068268  0.144350  0.760605  0.045650  0.052186  0.026200  ...  5.000000  54429.852171  0.037486  0.000101   0.000654 -0.000076    3  358.553990  27.409170 -0.009368
739  sn2008bf  0.022068  0.021275    0  15.718540  0.144685  0.430639  0.068523 -0.038367  0.021262  ...  0.156500  54555.109466  0.090470  0.000136   0.000409 -0.000104    3  181.011990  20.245080 -0.009159

[740 rows x 21 columns]
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 740 entries, 0 to 739
Data columns (total 21 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Name       740 non-null    object 
 1   zcmb       740 non-null    float64
 2   hel        740 non-null    float64
 3   e_z        740 non-null    int64  
 4   mb         740 non-null    float64
 5   e_mb       740 non-null    float64
 6   x1         740 non-null    float64
 7   e_x1       740 non-null    float64
 8   c          740 non-null    float64
 9   e_c        740 non-null    float64
 10  logMst     740 non-null    float64
 11  e_logMst   740 non-null    float64
 12  tmax       740 non-null    float64
 13  e_tmax     740 non-null    float64
 14  cov(mbs)   740 non-null    float64
 15  cov(mb,c)  740 non-null    float64
 16  cov(s,c)   740 non-null    float64
 17  set        740 non-null    int64  
 18  RAdeg      740 non-null    float64
 19  DEdeg      740 non-null    float64
 20  bias       740 non-null    float64
dtypes: float64(18), int64(2), object(1)
memory usage: 121.5  KB
  • Related