Home > Net >  How to manipulate Pandas Series without changing the given Original?
How to manipulate Pandas Series without changing the given Original?

Time:01-16

Context

I have a method that takes a Pandas Series of categorial Data and returns it as an indexed version. However, I think my implementation is also modifying the given Series, not just returning a modified new Series. I also get the following Errors:

A value is trying to be set on a copy of a slice from a DataFrame. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy series[series == value] = index

SettingWithCopyWarning: modifications to a property of a datetimelike object are not supported and are discarded. Change values on the original. cacher_needs_updating = self._check_is_chained_assignment_possible()


Code

def categorials(series: pandas.Series) -> pandas.Series:
    unique = series.unique()

    for index, value in enumerate(unique):
        series[series == value] = index

    return series.astype(pandas.Int64Dtype())

Question

  • How can I achieve my goal: This method should return the modified series without manipulating the original given series?

CodePudding user response:

You need to .copy() the incoming argument. Normally, that warning wouldn't have appeared; we're at liberty to write to Series/DataFrames after all. However, in the code you didn't share, it seems the argument you're passing here was obtained as a subset of another Series/Frame (or maybe even itself). FYI, if you're planning to do modifications on a subset, better chain .copy() at the end of initialization.

Anyway, back to the question, series = series.copy() as the first line in the function should resolve the issue. However, your method is actually doing factorization, so

pd.Series(pd.factorize(series)[0], index=series.index)

is equivalent to what your function does, where since pd.factorize returns a 2-tuple of (codes, uniques), we take the 0th one. Also it gives a NumPy array back, so we Series-ify it with the incoming index. Noting that, it does not attempt to modify the original Series, so no .copy is needed for it.

  • Related