Can we force Python/Pandas to flush to disk immediately?-CodePudding

I have a setup where a python script (let's call it test1.py) is spawning a subprocess which executes test2.py. In test2.py, I have some pandas operations which ultimately builds a dataframe test. The final step in test2.py is saving the dataframe to csv (test.to_csv('my_path')). On completion of test2.py, test1.py continues execution and the next step required is to load the same csv file created (i.e., test = pd.read_csv('my_path')).

Now, the issue is that Python is not flushing the buffer to disk, and therefore, when test1.py goes to read the csv file, I get a FileNotFoundError. Of course, if I stop the script, the file is saved to disk. Is there a way to force pandas to flush to disk immediately? I've read about using file.flush() and os.fsync(fd) - but this don't seem to apply to my case since I'm not dealing with any file descriptors.

EDIT: Added a (significantly) simplified example

test1.py looks something like:

import subprocess


def main():
    cmd = ['python3', 'test2.py']
    output_bytes = subprocess.check_output(cmd, stderr=subprocess.STDOUT, timeout=900)
    output = output_bytes.decode('utf-8')
    # test2.py finished, so I want to read the csv
    df = pd.read_csv('my_path')


if __name__ == '__main__':
    main()

test2.py looks something like:

import pandas as pd
import numpy as np

def main():
    df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
    df.to_csv('my_path')

if __name__ == '__main__':
    main()

CodePudding user response：

but this don't seem to apply to my case since I'm not dealing with any file descriptors.

You do not have to use filename as 1st argument for .to_csv, as pandas.DataFrame.to_csv docs says you might use

file-like object implementing a write() function.

therefore you can do something like this

import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
f = open("file.csv","w",newline="")
df.to_csv(f)
f.flush()
f.close()

Observe that if you open file in non-binary mode, then you need to disengage universal newlines.