Why do we need to redefine pandas DataFrame after changing columns?-CodePudding

I just wondered why Pandas DataFrame class functions do not change their instance. For example, if I use pd.DataFrame.rename(), dropn(), I need to update the instance by redefining it. However, if its class is list, you can delete an element by a pop() method without redefining it. The function changes its intrinsic instance.

Is there a reason why pandas or numpy use this kind of style? Can you explain why this kind of styling is better or its advantages?

CodePudding user response：

Pandas has made this option available to users. The 'inplace' parameter in the functions you mentioned works for this. If you set the inplace parameter to True, it will perform the operation on the original DataFrame. I leave some useful links about it.

https://towardsdatascience.com/learn-how-to-use-pandas-inplace-parameter-once-and-for-all-5a29bb8bf338

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

Best Regard

CodePudding user response：

The reason is to allow the option to overwrite the dataframe object you are working on, or to leave it unchanged by creating a copy and assigning it to a different variable. The option is valuable as depending on the circumstances you may want to directly modify the original data or not.

The inplace parameter is one way in which you have the power to choose between the two options.

CodePudding user response：

Each class defines what changes can be done in-place and which can't, creating instead a new object. The reasons are varied and can't be reduced to a few simple rules.

The underlying data structure of a list is designed for growth and shrinkage. Even so some changes are cheaper than others. append and pop at the end requires fewer changes of the data than addition or removal of items at the beginning or middle. Even so, actions like blist = alist[1:] produce a new list.

tuple is a variation on list that is immutable, and is widely used in the base Python for function arguments and packing/unpacking results.

A numpy array has a fixed size. Like lists, individual values can be changed in-place, but growth requires making a new array (except for a limited use of resize). numpy also has a view mechanism that makes a new array, but which shares underlying data. This can be efficient, but has pitfalls for the unwary.

pandas is built on numpy, with indices and values stored in arrays. As other answers show it often has a in-place option, but I suspect that doesn't actually reduce the work or run time. We'd have to know a lot more about the change(s) and dataframe structure.

Ultimately we, SO posters, can't answer "why" questions authoritatively. We can only give opinions based on knowledge and experience. Most of are not developers, and certainly not original developers.