I just wondered why Pandas DataFrame class functions do not change their instance. For example, if I use pd.DataFrame.rename(), dropn(), I need to update the instance by redefining it. However, if its class is list, you can delete an element by a pop() method without redefining it. The function changes its intrinsic instance.
Is there a reason why pandas or numpy use this kind of style? Can you explain why this kind of styling is better or its advantages?
CodePudding user response:
Pandas has made this option available to users. The 'inplace' parameter in the functions you mentioned works for this. If you set the inplace parameter to True, it will perform the operation on the original DataFrame. I leave some useful links about it.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
Best Regard
CodePudding user response:
The reason is to allow the option to overwrite the dataframe object you are working on, or to leave it unchanged by creating a copy and assigning it to a different variable. The option is valuable as depending on the circumstances you may want to directly modify the original data or not.
The inplace
parameter is one way in which you have the power to choose between the two options.
CodePudding user response:
Each class defines what changes can be done in-place
and which can't, creating instead a new object. The reasons are varied and can't be reduced to a few simple rules.
The underlying data structure of a list
is designed for growth and shrinkage. Even so some changes are cheaper
than others. append
and pop
at the end requires fewer changes of the data than addition or removal of items at the beginning or middle. Even so, actions like blist = alist[1:]
produce a new list.
tuple
is a variation on list
that is immutable, and is widely used in the base Python for function arguments and packing/unpacking results.
A numpy
array has a fixed size. Like lists, individual values can be changed in-place, but growth requires making a new array (except for a limited use of resize
). numpy
also has a view
mechanism that makes a new array, but which shares underlying data. This can be efficient, but has pitfalls for the unwary.
pandas
is built on numpy
, with indices and values stored in arrays. As other answers show it often has a in-place
option, but I suspect that doesn't actually reduce the work or run time. We'd have to know a lot more about the change(s) and dataframe structure.
Ultimately we, SO posters, can't answer "why" questions authoritatively. We can only give opinions based on knowledge and experience. Most of are not developers, and certainly not original developers.