Home > Software engineering >  Inserting rows in specific location using pandas
Inserting rows in specific location using pandas

Time:08-07

I have a CSV-file containing the following data structure:

2015-01-02,09:30:00,64.815
2015-01-02,09:35:00,64.8741
2015-01-02,09:55:00,65.0255
2015-01-02,10:00:00,64.9269

By using Pandas in Python, I would like to quadruple the 2nd row and insert the new rows after the 2nd row (filling up the missing intervals with the 2nd row). Eventually, it should look like:

2015-01-02,09:30:00,64.815
2015-01-02,09:35:00,64.8741
2015-01-02,09:40:00,64.8741
2015-01-02,09:45:00,64.8741
2015-01-02,09:50:00,64.8741
2015-01-02,09:55:00,65.0255
2015-01-02,10:00:00,64.9269
2015-01-02,10:05:00,64.815

I have the following code:

    df = pd.read_csv("csv.file", header=0, names=['date', 'minute', 'price'])

    for i in range(len(df)):
        if i != len(df)-1:
            next_i = i 1
            if df.loc[next_i, 'date'] == df.loc[i, 'date'] and df.loc[i, 'minute'] != "16:00:00":
                now = int(df.loc[i, "minute"][:2] df.loc[i, "minute"][3:5])
                future = int(df.loc[next_i, "minute"][:2] df.loc[next_i, "minute"][3:5])
                
                while now   5 != future and df.loc[next_i, "minute"][3:5] != "00" and df.loc[next_i, "minute"][3:5] != "60":
                    newminutes = str(int(df.loc[i, "minute"][3:5]) 5*a)
                    newtime = df.loc[next_i, "minute"][:2]  ":" newminutes ":00"
                    df.loc[next_i-0.5] = [df.loc[next_i, 'date'], newtime , df.loc[i, 'price']]
                    df = df.sort_index().reset_index(drop=True)
                    now = int(newtime[:2] newtime[3:5])
                    future = int(df.loc[next_i 1, "minute"][:2] df.loc[next_i 1, "minute"][3:5])

However, it's not working.

CodePudding user response:

one way is create the needed index, merge left and forward fill:

first make sure you have a proper timestamp column:

df['ts'] = pd.to_datetime(df[0] ' ' df[1])
df = df[['ts', 2]]

you should get something like this:

ts 2
0 2015-01-02 09:30:00 64.815
1 2015-01-02 09:35:00 64.8741
2 2015-01-02 09:55:00 65.0255
3 2015-01-02 10:00:00 64.9269

then create the date range index:

new_df = pd.DataFrame(index=pd.date_range(start=df['ts'].min(), 
end=df['ts'].max(), freq='5min'))

then left merge to it and forward fill:

new_df.merge(df, left_index=True, right_on='ts', how='left').fillna(method='ffill').reset_index(drop=True)
ts 2
0 2015-01-02 09:30:00 64.815
1 2015-01-02 09:35:00 64.8741
2 2015-01-02 09:40:00 64.8741
3 2015-01-02 09:45:00 64.8741
4 2015-01-02 09:50:00 64.8741
5 2015-01-02 09:55:00 65.0255
6 2015-01-02 10:00:00 64.9269

CodePudding user response:

I see there is an extra row in the expected output 2015-01-02,10:05:00,64.815. To accomodate that as well you can reindex using pd.DateRange.

Creating data
data = {
    'date' : ['2015-01-02', '2015-01-02', '2015-01-02', '2015-01-02'],
    'time' : ['09:30:00', '09:35:00', '09:55:00', '10:00:00'],
    'val' : [64.815, 64.8741, 65.0255, 64.9269]
    }
df = pd.DataFrame(data)

Creating datetime column for reindexing

df['datetime'] = pd.to_datetime(df['date']   ' '   df['time'])
df.set_index('datetime', inplace=True)

Generating output

df.resample('5min').asfreq().reindex(pd.date_range('2015-01-02 09:30:00', '2015-01-02 10:05:00', freq='5 min')).ffill().reset_index(drop=True)

Output

This gives us the expected output

         date      time      val
0  2015-01-02  09:30:00  64.8150
1  2015-01-02  09:35:00  64.8741
2  2015-01-02  09:35:00  64.8741
3  2015-01-02  09:35:00  64.8741
4  2015-01-02  09:35:00  64.8741
5  2015-01-02  09:55:00  65.0255
6  2015-01-02  10:00:00  64.9269
7  2015-01-02  10:00:00  64.9269

However if that was a typo and you don't want the last row you can do this :

df.resample('5min').asfreq().reindex(pd.date_range(df.index[0], df.index[len(df)-1], freq='5 min')).ffill().reset_index(drop=True)

which gives is

         date      time      val
0  2015-01-02  09:30:00  64.8150
1  2015-01-02  09:35:00  64.8741
2  2015-01-02  09:35:00  64.8741
3  2015-01-02  09:35:00  64.8741
4  2015-01-02  09:35:00  64.8741
5  2015-01-02  09:55:00  65.0255
6  2015-01-02  10:00:00  64.9269
  • Related