Currently I was trying to do a machine learning classification of 6 time series datasets (in .csv format) using MiniRocket, an sktime machine learning package. However, when I imported the .csv files using pd.read_csv and run them through MiniRocket, the error "TypeError: X must be in an sktime compatible format" pops up, and it says that the following data types are sktime compatible: ['pd.Series', 'pd.DataFrame', 'np.ndarray', 'nested_univ', 'numpy3D', 'pd-multiindex', 'df-list', 'pd_multiindex_hier'] Then I checked the data type of my imported .csv files and got "pandas.core.Frame.DataFrame", which is a data type that I never saw before and is obviously different from the sktime compatible pd.DataFrame. What is the difference between pandas.core.Frame.DataFrame and pd.DataFrame, and how to convert pandas.core.Frame.DataFrame to the sktime compatible pd.DataFrame?
I tried to convert pandas.core.Frame.DataFrame to pd.DataFrame using df.join and df.pop functions, but neither of them was able to convert my data from pandas.core.Frame.DataFrame to pd.DataFrame (after conversion I checked the type again and it is still the same).
CodePudding user response:
If you just take the values from your old DataFrame with .values
, you can create a new DataFrame the standard way. If you want to keep the same columns and index values, just set those when you declare your new DataFrame.
df_new = pd.DataFrame(df_old.values, columns=df_old.columns, index=df_old.index)
CodePudding user response:
Most of the pandas classes are defined under pandas.core
folder: https://github.com/pandas-dev/pandas/tree/main/pandas/core.
For example, class DataFrame
is defined in pandas.core.frame.py
:
class DataFrame(NDFrame, OpsMixin):
...
def __init__(...)
...
Pandas is not yet a py.typed library PEP 561, hence the public API documentation uses pandas.DataFrame
but internally all error messages still refer to the source file structure such as pandas.core.frame.DataFrame
.