I'm VERY new to Azure Data Factory, so pardon me for my stupid or obvious question.
I want to schedule the copy files stored in a GCS bucket in Azure Blob Storage once a day. Until now, I managed to copy (both manually and by scheduling the activity of the pipeline) file from the bucket in GCS where I'm uploading the file manually.
In the near future, the upload will happen automatically once a day at a given time, presumably during the night. My goal is to schedule the copy of just the last added file and avoid copying every time all the file, overwriting the existing ones.
It's something that requires writing some python script? Is there some parameter to set?
Thank you all in advance for the replies.
CodePudding user response:
There is no need of any explicit coding. Adf support simple copy activity to move data from gcs to blob storage wherein your gcs would act as source and blob storage would act as sink in copy activity.
https://docs.microsoft.com/en-us/azure/data-factory/connector-google-cloud-storage?tabs=data-factory
And to get the latest file, you can use get meta data activity to get list of files and filter for the latest file