Extract data from table column and make variables in Python-CodePudding

I have a dataset where I want to make a new variable everytime 'Recording' number changes. I want the new variable to include the 'Duration' data for the specific 'Recording' and the previous data. So for the below table it would be:

Var1 = (3, 3, 3)

Var2 = (3, 3, 3, 4, 6)

Var2 = (3, 3, 3, 4, 6, 4, 3, 1, 4)

And so on. I have several dataset that can have different number of recordings (but always starting from 1) and different number of durations for each recording. Any help is greatly appreciated.

Recording	Duration
1	3
1	3
1	3
2	4
2	6
3	4
3	3
3	1
3	4

CodePudding user response：

You can aggregate list with cumualative sum for lists, then convert to tuples and dictionary:

d = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple).to_dict()
print (d)
{1: (3, 3, 3), 2: (3, 3, 3, 4, 6), 3: (3, 3, 3, 4, 6, 4, 3, 1, 4)}

print (d[1])
print (d[2])
print (d[3])

Your ouput is possible, but not recommended:

s = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple)

for k, v in s.items():
    globals()[f'Var{k}'] = v

CodePudding user response：

@jezrael's answer is beautiful and definately better :). But if you really wanted to do this as a loop, (perhaps in future you might want to modify the logic further), then you might:

import pandas as pd

df = pd.DataFrame({
    "Recording": [1,1,1,2,2,3,3,3,3],
    "Duration": [3,3,3,4,6,4,3,1,4]
})  # your example data

records = {}
record = []
last_recording = None  # flag to track change in recording

for r, d in zip(df.Recording, df.Duration):
    if record and not r == last_recording:
        records[last_recording] = (tuple(record))
    record.append(d)
    last_recording = r
    
records[last_recording] = (tuple(record))  # capture final group
        
print(records)

modified to provide a dict (which seems sensible). This will be slow for large datasets.