Home > Back-end >  Append rows to a dataframe efficiently
Append rows to a dataframe efficiently

Time:05-11

I have a dataframe that looks like this

import pandas as pd

df = pd.DataFrame({'Timestamp': ['1642847484', '1642847484', '1642847484', '1642847484', '1642847487', '1642847487','1642847487','1642847487','1642847487','1642847487','1642847487','1642847487', '1642847489', '1642847489', '1642847489'],
                   'value': [11, 10, 14, 20, 3, 2, 9, 48, 5, 20, 12, 20, 56, 12, 8]})

I need to do some operations on each group of values with the same timestamp , so I use groupBy as follows :

df_grouped = df.groupby('Timestamp')

And then iterate over the rows of each group and append the results row by row in a new dataframe:

df_out = pd.DataFrame(columns=( 'Timestamp', 'value'))
for group_name, df_group in df_grouped:
    i = 0
    for row_index, row in df_group.iterrows():
        row['Timestamp'] = row['Timestamp']* 1000    i * 30
        df_out = df_out.append(row)
        i = i 1
    print(df_out.tail())

But my approach takes so much time (7M rows ) and I was wondering if there is a more efficient way to do so . Thank you

CodePudding user response:

I think itterows here is not necessary, you can use:

def f(x):

    x['Timestamp'] = ...
    ....
    return x
    
df1 = df.groupby('Timestamp').apply(f)

EDIT: Create counter Series by GroupBy.cumcount, multiple and add to Timestamp:

#if necessary
df['Timestamp'] = df['Timestamp'].astype(np.int64)

df['Timestamp'] = df['Timestamp'] * 1000   df.groupby('Timestamp').cumcount() * 30
print(df)
        Timestamp  value
0   1642847484000     11
1   1642847484030     10
2   1642847484060     14
3   1642847484090     20
4   1642847487000      3
5   1642847487030      2
6   1642847487060      9
7   1642847487090     48
8   1642847487120      5
9   1642847487150     20
10  1642847487180     12
11  1642847487210     20
12  1642847489000     56
13  1642847489030     12
14  1642847489060      8
  • Related