Home > Net >  Is building a Python module that depends on certain input structure (Pandas DataFrame) a bad practic
Is building a Python module that depends on certain input structure (Pandas DataFrame) a bad practic

Time:11-09

I'm working on developing a Python library related to financial modelling.

The thing is, a lot of the functions I am creating depend on inputs that are being fed as tables (which I'm treating as Pandas DataFrames) to the functions I am creating. For example, what I do a lot is functions that take a Pandas DataFrame as an argument and manipulate those DataFrame columns and give a result as an output, here is what I mean:

def example_function(input:pd.DataFrame) -> float:

    input['created_column'] = input['col_1']   input['col_2']

    return input['created_column'].sum()

The functions I have are a lot more complex than this but you get the idea.

The thing that rubs me the wrong way about this is that the function will only work if the DataFrame being fed has the same exact structure every time, so, if the user feeds a slightly different DataFrame, everything will break.

I'm having a hard time figuring out a solution to this problem, without adding a ton of complexity to the functions, since the functions that I'm actually developing are much more complex than this. Depending on several different information contained on those tables.

Is this way of developing actually a bad practice? If so, how should I actually go about as developing these functions?

What I have developed is functions that look like the example I've shown above. And I'm having a hard time figuring out if this is actually the best approach I could take.

CodePudding user response:

It is not a bad practice to build a Python module that depends on a certain input structure, such as a Pandas DataFrame. However, it is important to be aware that this dependency can make your module less portable and more difficult to use in other contexts.

CodePudding user response:

Just as you are parameterizing the function on the data frame to operate on, you can parameterize it to operate on which columns to operate on in that data frame.

def example_function(input:pd.DataFrame, inp1: str, inp2: str, out: str) -> float:

    input[out] = input[inp1]   input[inp2]

    return input[out].sum()

There's no hard and fast rule on how specific or how general any given function must be.

At the same time, consider whether the module you are defining is suitable as a library, or just helper functions for the particular application the produces the data frames you are operating on.

  • Related