Home > Blockchain >  Read online excel file with a specific sheet and only selected columns
Read online excel file with a specific sheet and only selected columns

Time:03-10

I have to read through pandas the CTG.xls file from the following path: https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.

From this file I have to select the sheet Data. Moreover I have to select from column K to the column AT of the file. So at the end one have a dataset with these column:

["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV" ,"MLTV" ,"Width","Min","Max" ,"Nmax","Nzeros","Mode","Mean" ,"Median" ,"Variance" ,"Tendency" ,"CLASS","NSP"]

How can I do this using the read function in pandas?

CodePudding user response:

Use:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'

df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)

Output

       LB        AC        FM        UC        DL DS DP ASTV MSTV ALTV  MLTV Width  Min  Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0     120         0         0         0         0  0  0   73  0.5   43   2.4    64   62  126    2      0  120  137    121       73        1     9   2
1     132   0.00638         0   0.00638   0.00319  0  0   17  2.1    0  10.4   130   68  198    6      1  141  136    140       12        0     6   1
2     133  0.003322         0  0.008306  0.003322  0  0   16  2.1    0  13.4   130   68  198    5      1  141  135    138       13        0     6   1
3     134  0.002561         0  0.007682  0.002561  0  0   16  2.4    0    23   117   53  170   11      0  137  134    137       13        1     6   1
4     132  0.006515         0  0.008143         0  0  0   16  2.4    0  19.9   117   53  170    9      0  137  136    138       11        1     2   1
...   ...       ...       ...       ...       ... .. ..  ...  ...  ...   ...   ...  ...  ...  ...    ...  ...  ...    ...      ...      ...   ...  ..
2121  140         0         0  0.007426         0  0  0   79  0.2   25   7.2    40  137  177    4      0  153  150    152        2        0     5   2
2122  140  0.000775         0  0.006971         0  0  0   78  0.4   22   7.1    66  103  169    6      0  152  148    151        3        1     5   2
2123  140   0.00098         0  0.006863         0  0  0   79  0.4   20   6.1    67  103  170    5      0  153  148    152        4        1     5   2
2124  140  0.000679         0   0.00611         0  0  0   78  0.4   27     7    66  103  169    6      0  152  147    151        4        1     5   2
2125  142  0.001616  0.001616  0.008078         0  0  0   74  0.4   36     5    42  117  159    2      1  145  143    145        1        0     1   1

[2126 rows x 23 columns]
  • Related