I have to read through pandas the CTG.xls file from the following path: https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.
From this file I have to select the sheet Data. Moreover I have to select from column K to the column AT of the file. So at the end one have a dataset with these column:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV" ,"MLTV" ,"Width","Min","Max" ,"Nmax","Nzeros","Mode","Mean" ,"Median" ,"Variance" ,"Tendency" ,"CLASS","NSP"]
How can I do this using the read function in pandas?
CodePudding user response:
Use:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
Output
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]