Is it possible to monkeypatch all objects of a class regardless of initialisation method?-CodePudding

I'm testing some code that manipulates data using pandas, and I want to avoid writing out the data during tests.

Let's say my code in a file name module.py is this:

import pandas as pd
import dask.dataframe as dd

def do_stuff() -> None:

    df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
    another_df = df.pivot_table(values='a', index='b')
    yet_another_df = another_df.groupby('b').sum()

    another_df.to_csv('data.csv')
    yet_another_df.to_csv('more_data.csv')

I want to intercept all these individual "to_csv" methods, so that tests ran don't write out data.

My first thought was to try something like this with pytest:

import module

class NonWritingDataFrame(pd.DataFrame):
    def to_csv(self, *args, **kwargs):
      pass

def test_do_stuff_returns_nothing(monkeypatch):
    monkeypatch.settattr(module, 'pd.DataFrame', NonWritingDataFrame)
    actual = module.do_stuff()
    assert actual is None

But sadly this doesn't work - it might for the first "df" variable (I'm not actually sure if it does) but the another_df and yet_another_df are returned by other pandas methods, and not from the "module" module, so are normal pandas DataFrames and not my special NonWritingDataFrame object.

My question is, is there a neat way to replace all pandas DataFrame "to_csv" calls, regardless of the method used to define the DataFrame?

CodePudding user response：

If you want to replace all references of to_csv(), one option is to take advantage of python's import mechanism.

A package will only get imported once, and subsequent calls will utilize the existing reference. So modifying to_csv() before you import module will give you the desired result --

import pandas
pandas.DataFrame.to_csv = lambda x: print("monkeypatched")

import module