Home > front end >  How can I use wildcards in my gcp bucket objects path?
How can I use wildcards in my gcp bucket objects path?

Time:03-16

My main problem is, I want to check if an object in gcp exists or not. So, what I tried

from google.cloud import storage
client = storage.Client()
path_exists = False
for blob in client.list_blobs('models', prefix='trainedModels/mddeep256_sarim'):
    path_exists = True
    break

It worked fine for me. But now the problem is I don't know the model name which is mddeep256 but I know further part _sarim

So, I want to use something like

for blob in client.list_blobs('models', prefix='trainedModels/*_sarim'):

I want to use * wildcard, how can I do that?

CodePudding user response:

In short: you can't!

You can only filter on the prefix. If you want to filter on the suffix (as you wish), start by filter on the longest prefix that you can with the API, and then iterate in your code to scan the file name and get those that match your pattern.

No built-il solution for that...

CodePudding user response:

list_blob doesn't support regex in prefix. you need filter by yourself as mentioned by Guilaume.

following should work.

def is_object_exist(bucket_name, object_pattern):
    from google.cloud import storage
    import re
    client = storage.Client()
    all_blobs = client.list_blobs(bucket_name)
    regex = re.compile(r'{}'.format(object_pattern))
    filtered_blobs = [b for b in all_blobs if regex.match(b.name)]
    return True if len(filtered_blobs) else False
  • Related