Home > other >  Python pandas CopyWarning
Python pandas CopyWarning

Time:10-10

I have a Date column with values like this 2018-02-02 07:00:06.000. I am trying to replace the column value with only the hour, so replace that date just now into 7.

-----Update------

Code before the issue

df = pd.read_csv("Data.csv")
df.columns = [c.replacec(' ','') for c in df.columns)
df.Date = pd.to_datetime(df.Date)

start = pd.to_datetime('07:00:00:00', format="%H:%M:%S:%f").time()
end = pd.to_datetime('10:59:59:00',format="%H:%M:%S:%f").time()

df_new= df[(df.Date.dt.time >= start) & (df.Date.dt.time <= end)]

what I did is

df_new.Date = df_new.Date.dt.hour

which gives me a warning

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

So then use

df_new.loc[:,["Date"]] = df_new.Date.dt.hour

But this also gives me the same warning.

Is there a correct or better way to do this?

CodePudding user response:

If this warning still pop when you use the suggested .loc method (which you used correctly) then it is probably a false positive warning as stated in detail in the answers to this question. Such warnings can be related to the pandas version. Try upgrading your pandas and see if it solves it.

Ultimately, you can try different methods for creating new columns like assign :

df = df.assign(Date = df.Date.dt.hour)

which adds a column to the existing dataframe rather than making a copy of it. Since it returns a dataframe, you are free to save it as a new dataframe or replace the old one.

Eventually, if it still pops up you can silence this warning with

pd.options.mode.chained_assignment = None

CodePudding user response:

Try not to use the attributes (df.Date) but rather the indices (df['Date']):

df = pd.read_csv("Data.csv")
df.columns = df.columns.str.replace(' ', '')
df['Date'] = pd.to_datetime(df['Date'])

start = pd.to_datetime('07:00:00:00', format="%H:%M:%S:%f").time()
end = pd.to_datetime('10:59:59:00',format="%H:%M:%S:%f").time()

df_new= df[(df['Date'].dt.time >= start) & (df['Date'].dt.time <= end)]
  • Related