Home > Software engineering >  how do you go though data frame values in chunks and combine them?
how do you go though data frame values in chunks and combine them?

Time:03-14

I have this data frame:

Metric  ProcId  TimeStamp               Value
CPU     proce_123   Mar-11-2022 11:00:00    1.4453125
CPU     proce_126   Mar-11-2022 11:00:00    0.058320373
CPU     proce_123   Mar-11-2022 11:00:00    0.095274389
CPU     proce_000   Mar-11-2022 11:00:00    0.019654088
CPU     proce_144   Mar-11-2022 11:00:00    0.019841269
CPU     proce_1     Mar-11-2022 11:00:00    0.234741792
CPU     proce_100   Mar-11-2022 11:00:00    5.32945776
CPU     proce_57777 Mar-11-2022 11:00:00    0.25390625
CPU     proce_0000  Mar-11-2022 11:00:00    0.019349845
CPU     proce_123   Mar-11-2022 11:00:00    0.019500781
CPU     proce_123   Mar-11-2022 11:00:00    2.32421875
CPU     proce_123   Mar-11-2022 11:00:00    68.3903656
CPU     proce_123   Mar-11-2022 11:00:00    0.057781201
CPU     proce_123       Mar-11-2022 11:00:00    0.416666627

this is just a sample data frame, the actual data frame is in thousands of rows. I need to go though this data frame in chunks the "ProdID" column and I need to create a string combining these ProdID in chunks for each iteration.

For example the string needs to be like this given the chunks size 3:

proce_123",%2proce_126",%2proce_123")

Please note after the 3rd chunk, we need to add "")". After the first ad second we need to add "",%2".

I can do something like this to print out the chunks:

n = 3 #size of chunks
chunks = [] #list of chunks

for i in range(0, len(id), n): 
    chunks.append(id[i:i   n])

I am not sure how would I combine these 3 items in one string and add the others strings at the end. Can anybody help here?

CodePudding user response:

chunk_size = 3
list_of_proc_ids = []
# First, generate a list of the procIds
for obj in range(0, len(id)):
    list_of_proc_ids.append(procId) # Not sure how you're appending this, guessing you use a slice on the string line?

final_str = ''
# Then enumerate through that list, adding a unique ending at every third
for index, obj in enumerate(list_of_proc_ids]:
    final_str  = str(obj)
    if (index   1) % chunk_size == 0: # Checks if divisible by 3, accounting for 0 index
        final_str  = '")'
    else:
        final_str  = '",%2'

CodePudding user response:

For an efficiency, use a vectorial approach:

import numpy as np
N = 3

# map code every N procid
s = np.where(np.arange(len(df))%N < N-1, '",%2', '")')

# concatenate strings
out = (df['ProcId'] '_' s).str.cat()

Output: 'proce_123_",%2proce_126_",%2proce_123_")proce_000_",%2proce_144_",%2proce_1_")proce_100_",%2proce_57777_",%2proce_0000_")proce_123_",%2proce_123_",%2proce_123_")proce_123_",%2proce_123_",%2'

  • Related