Here is my problem.
I have a dataframe imported from a .xlsx file. It contains one column with dates but the problem is it is not presented as datetime.
For instance, in the first line, it is the date (format: DD//MM/YYYY (type str) and the 24 following lines are the hours (format: xh (h for hour, x is between 0 and 23)). This is repeated for three years.
I would like to transform this column so as the cells will be datetime in format YY-MM-DD HH:MM:SS.
First of all, I created a dataframe df2 containing the hours:
indexNames = df1[df1['Hour'].str.contains('/')].index
df2= df1.drop(indexNames)
I transformed it to get it as datetime format HH:MM
# Conserving the number
new = df2["Hour"].str.split("h", n = 1, expand = True)
new["new_Hour"]= new[0]
# Dropping old Name columns
df2.drop(columns = ["Hour"], inplace = True)
# Transforming in datetime format
df2['new_Hour'] = pd.to_datetime(df2['new_Hour'], format="%H")
df2['new_Hour'] = df2['new_Hour'].astype(str)
nouv = df2['new_Hour'].str.split(' ', n=1, expand = True)
df2["Hour"]= nouv[1]
df2.drop(columns = ["new_Hour"], inplace = True)
Then, I created a second dataframe having the date and added separated columns for corresponding year, month and day:
df3= df1.loc[df1['Hour'].str.contains('/')].copy()
df3['Hour'] = pd.to_datetime(df3['Hour'], format="%d/%m/%Y")
df3['year'] = df3['Hour'].dt.year
df3['month'] = df3['Hour'].dt.month
df3['day'] = df3['Hour'].dt.day
Here comes my problem,
df3 indexes are strating at 0 and taking 25 at each line. It means df3.index[0] = 0, df3.index[1] = 25, df3.index[2] = 50 etc
df2 indexes are starting at 1 and more genarally, indexes of df3 are missing.
I would like to add the corresponding date of df3 to the corresponding hours of df2.
After having reseted indexes of ddf2 and df3, I tried:
df4 = df2.copy()
df4['year'] = 2019
df4= df4.reset_index(drop = True)
for i in range(len(df3)-1):
df4['year'].iloc[df3.index[i]:df3.index[i 1]] = df3['year'][i]
But I get copy problems and probably indexes problems too.
Hope you could help me, thanks.
CodePudding user response:
you might want to start out with a cleaner way to create a datetime column? e.g. like (sorry, too long for a comment...)
import pandas as pd
# dummy sample...
df = pd.DataFrame({'Hour': ["10/12/2013", "0", "1", "3",
"11/12/2013", "0", "1", "3"]})
# make a date column, forward-fill the dates
df['Datetime'] = pd.to_datetime(df['Hour'], format="%d/%m/%Y", errors='coerce').fillna(method="ffill")
# now we can add the hour
df['Datetime'] = df['Datetime'] pd.to_timedelta(pd.to_numeric(df['Hour'], errors='coerce'), unit='h')
# and optionally drop nans in the Datetime column, i.e. where we had dates initially
df = df[df["Datetime"].notna()].reset_index(drop=True)
df
Hour Datetime
0 0 2013-12-10 00:00:00
1 1 2013-12-10 01:00:00
2 3 2013-12-10 03:00:00
3 0 2013-12-11 00:00:00
4 1 2013-12-11 01:00:00
5 3 2013-12-11 03:00:00