Home > database >  A dataset with Int64, Float64 and datetime64[ns] gets converted to object after applying Pandas fill
A dataset with Int64, Float64 and datetime64[ns] gets converted to object after applying Pandas fill

Time:08-06

I am using Kaggle's dataset (The data types initally

If I do a df.dtypes I can see the correct datatypes however, after the following lines of code, it changes to object datatype.

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def fault_mapper_FD(faultDate):
    if pd.Timestamp(2017, 8, 27, 0) <= faultDate <= pd.Timestamp(2017, 8, 28, 0):
        return 0
    if pd.Timestamp(2017, 8, 29, 0) <= faultDate <= pd.Timestamp(2017, 8, 29, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 1, 0) <= faultDate <= pd.Timestamp(2017, 12, 1, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 3, 0) <= faultDate <= pd.Timestamp(2017, 12, 3, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 7, 0) <= faultDate <= pd.Timestamp(2017, 12, 8, 0):
        return 0
    if pd.Timestamp(2017, 12, 14, 0) <= faultDate <= pd.Timestamp(2017, 12, 14, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 7, 0) <= faultDate <= pd.Timestamp(2018, 2, 7, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 9, 0) <= faultDate <= pd.Timestamp(2018, 2, 9, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 20, 0) <= faultDate <= pd.Timestamp(2017, 12, 20, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 18, 0) <= faultDate <= pd.Timestamp(2018, 2, 18, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 1, 0) <= faultDate <= pd.Timestamp(2018, 2, 1, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 31, 0) <= faultDate <= pd.Timestamp(2018, 1, 31, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 28, 0) <= faultDate <= pd.Timestamp(2018, 1, 28, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 27, 0) <= faultDate <= pd.Timestamp(2018, 1, 27, 23, 59):
        return 0
    if (pd.Timestamp(2017, 9, 1, 0) <= faultDate <= pd.Timestamp(2017, 9, 1, 23, 59) or 
    pd.Timestamp(2017, 11, 30, 0) <= faultDate <= pd.Timestamp(2017, 11, 30, 23, 59) or 
    pd.Timestamp(2017, 12, 9, 0) <= faultDate <= pd.Timestamp(2017, 12, 9, 23, 59) or 
    pd.Timestamp(2017, 12, 10, 0) <= faultDate <= pd.Timestamp(2017, 12, 11, 0) or 
    pd.Timestamp(2017, 12, 24, 0) <= faultDate <= pd.Timestamp(2017, 12, 24, 23, 59) or 
    pd.Timestamp(2018, 2, 4, 0) <= faultDate <= pd.Timestamp(2018, 2, 4, 23, 59) or 
    pd.Timestamp(2018, 2, 5, 0) <= faultDate <= pd.Timestamp(2018, 2, 6, 0)):
        return 1

df['FD'] = df['Timestamp'].apply(lambda fault_date: fault_mapper_FD(fault_date))

cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)

Now the df.dtypes gives all of my columns as objects/

The data types after the Pandas fillna methos

CodePudding user response:

This is my question as well I have a simillar problem with another dataset which makes changes all my data types

CodePudding user response:

I think you have a small typo. You just need to call

df = df[cond].fillna(0,axis=0)

which indeed doesn't change datatypes

Timestamp                               datetime64[ns]
RTU: Supply Air Temperature                    float64
RTU: Return Air Temperature                    float64
RTU: Supply Air Fan Status                       int64
RTU: Circuit 1 Discharge Temperature           float64
                                             ...      
VAV Box: Room 203 Air Temperature              float64
VAV Box: Room 204 Air Temperature              float64
VAV Box: Room 205 Air Temperature              float64
VAV Box: Room 206 Air Temperature              float64
Fault Detection Ground Truth                     int64
Length: 69, dtype: object
  • Related