Home > Back-end >  reading files from path with wildcard does not work - Databricks JSON
reading files from path with wildcard does not work - Databricks JSON

Time:01-18

trying to read a JSON file from databricks with the following code

  with open('/dbfs/mnt/bronze/categories/20221006/data_10.json') as f:
    d = json.load(f)

which works perfecyl but problem is that I would like to use the wild cards since there are multiple folders and files. Preferebly want to make the below code working

with open('/dbfs/mnt/bronze/categories/**/*.json') as f:
    d = json.load(f)

when I read JSON using spark, wildcards work perfectly. But I prefer the above option

df = spark.read.json(f'/mnt/bronze/AKENEO/categories/**/*.json')

CodePudding user response:

You can create a quick script that goes through the folders using os.walk. You can see an example here

Basically it will allow you not to use the wildcards at all, but will require some more code.

  • Related