I currently have a pandas dataframe with some columns. I'm looking to build a column, Sequential
, that lists what iteration is recorded at that part of the cycle. I'm currently doing this using itertools.cycle
, and a fixed number of iterations block_cycles
, like so:
# Fill out Sequential Numbers
block_cycles = 330
lens = len(raw_data.index)
sequential = list(itertools.islice(itertools.cycle(range(1, block_cycles)),lens))
interim_output['Sequential'] = sequential
With an output like this:
print(interim_output['Sequential'])
0 1
1 2
2 3
...
329 330
331 1
332 2
332 3
And this would be fine, if the number of iterations in a cycle was the same. However, upon investigation, I've found that not every cycle contains the same amount of iterations. I have another column, CycleNumber
, that contains what cycle number the iteration belongs to. It looks like this:
print(raw_data['CycleNumber'])
0 1
1 1
2 1
3 1
4 1
51790 4936
51791 4936
51792 4936
51793 4936
51794 4936
So, for example, one cycle might contain 330 iterations, and another could contain 333, 331, and so forth - it's not guaranteed to be the same. The values in cycle number increase incrementally.
I've built a dictionary of the amount of iterations each cycle contains, cycle_freq
, which looks like this:
# Calculate the number of iterations each cycle contains
cycle_freq = {}
for item in cycle_number:
if (item in cycle_freq):
cycle_freq[item] = 1
else:
cycle_freq[item] = 1
print (cycle_freq)
{1: 330, 2: 332, 3: 331, 4: 332, 5: 332, 6: 333, 7: 333, 8: 330....
4933: 331, 4934: 334, 4935: 287, 4936: 24}
How could I go about using this dictionary to replace the constant variable block_cycles
, creating a big column list of sequential numbers based on exactly how many iterations were in that cycle? So far, this is my logic to try to get it to use the values contained in the dictionary cycle_freq
, but to no avail:
for i in cycle_freq:
iteration = list(itertools.islice(itertools.cycle(range(1, cycle_freq[i])),lens))
sequential.append(iteration)
My desired output would look like this:
0 1
1 2
3 3
...
329 330
330 1
332 2
...
661 332
662 1
663 2
Any help would be greatly appreciated!
CodePudding user response:
I've used a workaround and gave up itertools:
sequential = []
for _, cycles in cycle_freq.items():
seq = [cycle for cycle in range(1, cycles 1)]
sequential.extend(seq)
interim_output['Sequential'] = sequential