Home > Net >  How list all available Dataset Versions in an Azure ML Dataset and also get the One before the Lates
How list all available Dataset Versions in an Azure ML Dataset and also get the One before the Lates

Time:10-14

Is there a way to list all the available versions of an Azure ML Dataset? Not via the UI, but by using the SDK. Also, How can we get the one before the latest version of that Azure ML Dataset?

The main goal here is to do identify the changes in the Data trends.

CodePudding user response:

Create a Machine learning studio resource group and workspace. Upload the dataset for several times and it will be updated with versions with the same name.

enter image description here

enter image description here

Use the below code block to get the versions of the dataset uploaded and information about those versions.

Code block 1

from azureml.core import Dataset
Diabetes1234 = Dataset.get_all(workspace = ws)
counts = Diabetes1234['Diabetes123'].version
versions = [Dataset.get_by_name(workspace = ws, name = 'Diabetes123', version = v) for v in range(1,counts 1)]

Code block 2

versions

Output

[{
   "source": [
     "('workspaceblobstore', 'UI/2022-10-14_055538_UTC/')"
   ],
   "definition": [
     "GetDatastoreFiles",
     "ParseDelimited",
     "DropColumns",
     "SetColumnTypes"
   ],
   "registration": {
     "id": "Your ID",
     "name": "Diabetes123",
     "version": 1,
     "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
   }
 },
 {
   "source": [
     "('workspaceblobstore', 'UI/2022-10-14_055914_UTC/')"
   ],
   "definition": [
     "GetDatastoreFiles",
     "ParseDelimited",
     "DropColumns",
     "SetColumnTypes"
   ],
   "registration": {
     "id": " Your ID ",
     "name": "Diabetes123",
     "version": 2,
     "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
   }
 },
 {
   "source": [
     "('workspaceblobstore', 'UI/2022-10-14_060011_UTC/')"
   ],
   "definition": [
     "GetDatastoreFiles",
     "ParseDelimited",
     "DropColumns",
     "SetColumnTypes"
   ],
   "registration": {
     "id": " Your ID ",
     "name": "Diabetes123",
     "version": 3,
     "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
   }
 },
 {
   "source": [
     "('workspaceblobstore', 'UI/2022-10-14_070300_UTC/')"
   ],
   "definition": [
     "GetDatastoreFiles",
     "ParseDelimited",
     "DropColumns",
     "SetColumnTypes"
   ],
   "registration": {
     "id": " Your ID ",
     "name": "Diabetes123",
     "version": 4,
     "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
   }
 },
 {
   "source": [
     "('workspaceblobstore', 'UI/2022-10-14_093655_UTC/')"
   ],
   "definition": [
     "GetDatastoreFiles",
     "ParseDelimited",
     "DropColumns",
     "SetColumnTypes"
   ],
   "registration": {
     "id": " Your ID ",
     "name": "Diabetes123",
     "version": 5,
     "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
   }
 }]

To get the last before the latest version. Use the below code block.

Code Block:

versions[-2]

Output

{
  "source": [
    "('workspaceblobstore', 'UI/2022-10-14_070300_UTC/')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ],
  "registration": {
    "id": "your ID",
    "name": "Diabetes123",
    "version": 4,
    "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
  }
}
  • Related