substract each two row in one column with pandas-CodePudding

I found this problem bellow while executing the code bellow on google colab it works normaly

df['temps'] = df['temps'].view(int).div(1e9).diff().fillna(0).abs()
print(df)

but while using jupyter notebook localy the error bellow appears

ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 df3['rebounds'] = pd.Series(df3['temps'].view(int).div(1e9).diff().fillna(0))

File C:\Python310\lib\site-packages\pandas\core\series.py:818, in Series.view(self, dtype)
    815 # self.array instead of self._values so we piggyback on PandasArray
    816 #  implementation
    817 res_values = self.array.view(dtype)
--> 818 res_ser = self._constructor(res_values, index=self.index)
    819 return res_ser.__finalize__(self, method="view")

File C:\Python310\lib\site-packages\pandas\core\series.py:442, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    440     index = default_index(len(data))
    441 elif is_list_like(data):
--> 442     com.require_length_match(data, index)
    444 # create/copy the manager
    445 if isinstance(data, (SingleBlockManager, SingleArrayManager)):

File C:\Python310\lib\site-packages\pandas\core\common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (830) does not match length of index (415)

any suggetions to resolve this !!

CodePudding user response：

Here are two ways to get this to work:

df3['rebounds'] = pd.Series(df3['temps'].view('int64').diff().fillna(0).div(1e9))

... or:

df3['rebounds'] = pd.Series(df3['temps'].astype('int64').diff().fillna(0).div(1e9))

For the following sample input:

df3.dtypes:

temps    datetime64[ns]
dtype: object

df3:

       temps
0 2022-01-01
1 2022-01-02
2 2022-01-03

... both of the above code samples give this output:

df3.dtypes:

temps       datetime64[ns]
rebounds           float64
dtype: object

df3:

       temps  rebounds
0 2022-01-01       0.0
1 2022-01-02   86400.0
2 2022-01-03   86400.0

The issue is probably that view() essentially reinterprets the raw data of the existing series as a different data type. For this to work, according to the Series.view() docs (see also the numpy.ndarray.view() docs) the data types must have the same number of bytes. Since the original data is datetime64, your code specifying int as the argument to view() may not have met this requirement. Explicitly specifying int64 should meet it. Or, using astype() instead of view() with int64 will also work.

As to why this works in colab and not in jupyter notebook, I can't say. Perhaps they are using different versions of pandas and numpy which treat int differently.

I do know that in my environment, if I try the following:

df3['rebounds'] = pd.Series(df3['temps'].astype('int').diff().fillna(0).div(1e9))

... then I get this error:

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

This suggests that int means int32. It would be interesting to see if this works on colab.