I have a text file which I am trying to open in a pandas dataframe however I am not quite sure how to parse the data in such a way that it splits at all of these values.
file is attached here: http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat
The readme file that explains the data is attached here: http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/ReadMe
As of now i am using:
pd.read_csv(http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat)
Once putting the data into pandas I want to also add custom headings.
Any help would be greatly appreciated.
CodePudding user response:
You can use pd.read_fwf
. Extract useful specs from your readme file:
colspecs = [(0, 12), (12, 21), (21, 30), (30, 31), (31, 41), (41, 50), (50, 60),
(60, 69), (69, 79), (79, 88), (88, 98), (98, 108), (108, 121),
(121, 130), (130, 140), (140, 150), (150, 160), (160, 161), (161, 172),
(172, 183), (183, 193)]
colnames = ['Name','zcmb', 'hel', 'e_z', 'mb', 'e_mb', 'x1', 'e_x1', 'c', 'e_c',
'logMst', 'e_logMst', 'tmax', 'e_tmax', 'cov(mbs)', 'cov(mb,c)',
'cov(s,c)', 'set', 'RAdeg', 'DEdeg', 'bias']
df = pd.read_fwf('http://cdsarc.u-strasbg.fr/ftp/J/A A/568/A22/tablef3.dat',
colspecs=colspecs, names=colnames)
Output:
>>> df
Name zcmb hel e_z mb e_mb x1 e_x1 c e_c ... e_logMst tmax e_tmax cov(mbs) cov(mb,c) cov(s,c) set RAdeg DEdeg bias
0 03D1au 0.503084 0.504300 0 23.001698 0.088031 1.273191 0.150058 -0.012353 0.030011 ... 0.110500 52909.745220 0.214332 0.000790 0.000440 -0.000030 1 36.043210 -4.037469 0.001697
1 03D1aw 0.580724 0.582000 0 23.573937 0.090132 0.974346 0.273823 -0.025076 0.036691 ... 0.088000 52902.898002 0.352732 0.002823 0.000415 0.001574 1 36.061634 -4.517158 0.000843
2 03D1ax 0.494795 0.496000 0 22.960139 0.088110 -0.728837 0.102069 -0.099683 0.030305 ... 0.112500 52915.923670 0.111634 0.000542 0.000475 -0.000024 1 36.097287 -4.720774 0.001692
3 03D1bp 0.345928 0.347000 0 22.398137 0.087263 -1.155110 0.112834 -0.040581 0.026679 ... 0.123500 52920.249015 0.102828 0.001114 0.000616 0.000295 1 36.657235 -4.838779 -0.000270
4 03D1co 0.677662 0.679000 0 24.078115 0.098356 0.618820 0.404295 -0.039380 0.067403 ... 0.284000 52954.458342 0.454715 0.011857 0.000780 0.005898 1 36.567748 -4.935050 -0.002855
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
735 sn2007co 0.027064 0.026962 0 16.504006 0.141685 -0.137806 0.061153 0.105288 0.020382 ... 0.280891 54265.212054 0.056635 0.000095 0.000377 0.000007 3 275.765000 29.897050 -0.009803
736 sn2007cq 0.025468 0.025918 0 15.797848 0.143429 -0.657941 0.115645 -0.060805 0.025820 ... 0.280891 54281.025669 0.070944 0.000392 0.000639 0.000075 3 333.668430 5.080160 -0.009575
737 sn2007f 0.023810 0.023590 0 15.895501 0.144315 0.618766 0.041400 -0.055411 0.026006 ... 0.118500 54124.058397 0.045234 -0.000055 0.000645 -0.000180 3 195.812750 50.618760 -0.009361
738 sn2007qe 0.023867 0.024000 0 16.068268 0.144350 0.760605 0.045650 0.052186 0.026200 ... 5.000000 54429.852171 0.037486 0.000101 0.000654 -0.000076 3 358.553990 27.409170 -0.009368
739 sn2008bf 0.022068 0.021275 0 15.718540 0.144685 0.430639 0.068523 -0.038367 0.021262 ... 0.156500 54555.109466 0.090470 0.000136 0.000409 -0.000104 3 181.011990 20.245080 -0.009159
[740 rows x 21 columns]
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 740 entries, 0 to 739
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 740 non-null object
1 zcmb 740 non-null float64
2 hel 740 non-null float64
3 e_z 740 non-null int64
4 mb 740 non-null float64
5 e_mb 740 non-null float64
6 x1 740 non-null float64
7 e_x1 740 non-null float64
8 c 740 non-null float64
9 e_c 740 non-null float64
10 logMst 740 non-null float64
11 e_logMst 740 non-null float64
12 tmax 740 non-null float64
13 e_tmax 740 non-null float64
14 cov(mbs) 740 non-null float64
15 cov(mb,c) 740 non-null float64
16 cov(s,c) 740 non-null float64
17 set 740 non-null int64
18 RAdeg 740 non-null float64
19 DEdeg 740 non-null float64
20 bias 740 non-null float64
dtypes: float64(18), int64(2), object(1)
memory usage: 121.5 KB