I'm saving files to a csv on my google cloud storage using pandas. The problem is that my file gets overwritten when I rewrite data.
url = gs://mybucket/my.csv
df.to_csv(url,mode="a", index=False, header=False)
I have however specified the writing mode to "a" to add afterwards without rewriting the file.
Thanks a lot for your help :)
CodePudding user response:
Google Cloud Storage objects are immutable. This means you cannot modify an object once created. You must implement read-modify-write and replace the existing object.
Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime. An object's storage lifetime is the time between successful object creation, such as uploading, and successful object deletion. In practice, this means that you cannot make incremental changes to objects, such as append operations or truncate operations. However, it is possible to replace objects that are stored in Cloud Storage, and doing so happens atomically: until the new upload completes, the old version of the object is served to readers, and after the upload completes the new version of the object is served to readers. So a single replacement operation simply marks the end of one immutable object's lifetime and the beginning of a new immutable object's lifetime.
Google also supports the Compose API. This supports combining two or more objects to result in a new Cloud Storage object.
With the Compose API, you could upload the append data to a temporary object, then combine the original object with the append object. This would emulate appending to a file.