Adding the values of the column of a dataframe to another dataframe under conditions?-CodePudding

Here is my problem.

I have a dataframe imported from a .xlsx file. It contains one column with dates but the problem is it is not presented as datetime.

For instance, in the first line, it is the date (format: DD//MM/YYYY (type str) and the 24 following lines are the hours (format: xh (h for hour, x is between 0 and 23)). This is repeated for three years.

I would like to transform this column so as the cells will be datetime in format YY-MM-DD HH:MM:SS.

First of all, I created a dataframe df2 containing the hours:

indexNames = df1[df1['Hour'].str.contains('/')].index
df2= df1.drop(indexNames)

I transformed it to get it as datetime format HH:MM

# Conserving the number
new = df2["Hour"].str.split("h", n = 1, expand = True) 
new["new_Hour"]= new[0]

# Dropping old Name columns
df2.drop(columns = ["Hour"], inplace = True)

# Transforming in datetime format
df2['new_Hour'] = pd.to_datetime(df2['new_Hour'], format="%H")
df2['new_Hour'] = df2['new_Hour'].astype(str)
nouv = df2['new_Hour'].str.split(' ', n=1, expand = True)
df2["Hour"]= nouv[1]
df2.drop(columns = ["new_Hour"], inplace = True)

Then, I created a second dataframe having the date and added separated columns for corresponding year, month and day:

df3= df1.loc[df1['Hour'].str.contains('/')].copy()
df3['Hour'] = pd.to_datetime(df3['Hour'], format="%d/%m/%Y")
df3['year'] = df3['Hour'].dt.year
df3['month'] = df3['Hour'].dt.month
df3['day'] = df3['Hour'].dt.day

Here comes my problem,

df3 indexes are strating at 0 and taking 25 at each line. It means df3.index[0] = 0, df3.index[1] = 25, df3.index[2] = 50 etc

df2 indexes are starting at 1 and more genarally, indexes of df3 are missing.

I would like to add the corresponding date of df3 to the corresponding hours of df2.

After having reseted indexes of ddf2 and df3, I tried:

df4 = df2.copy()
df4['year'] = 2019
df4= df4.reset_index(drop = True)
for i in range(len(df3)-1):
    df4['year'].iloc[df3.index[i]:df3.index[i 1]] = df3['year'][i]

But I get copy problems and probably indexes problems too.

Hope you could help me, thanks.

CodePudding user response：

you might want to start out with a cleaner way to create a datetime column? e.g. like (sorry, too long for a comment...)

import pandas as pd

# dummy sample...
df = pd.DataFrame({'Hour': ["10/12/2013", "0", "1", "3",
                            "11/12/2013", "0", "1", "3"]})

# make a date column, forward-fill the dates
df['Datetime'] =  pd.to_datetime(df['Hour'], format="%d/%m/%Y", errors='coerce').fillna(method="ffill")

# now we can add the hour
df['Datetime'] = df['Datetime']   pd.to_timedelta(pd.to_numeric(df['Hour'], errors='coerce'), unit='h')

# and optionally drop nans in the Datetime column, i.e. where we had dates initially
df = df[df["Datetime"].notna()].reset_index(drop=True)

df
  Hour            Datetime
0    0 2013-12-10 00:00:00
1    1 2013-12-10 01:00:00
2    3 2013-12-10 03:00:00
3    0 2013-12-11 00:00:00
4    1 2013-12-11 01:00:00
5    3 2013-12-11 03:00:00