how use some function to avoid writing for loops?-CodePudding

I have a data frame like this:

2pair                   counts
'A','B','C','D'           5
'A','B','K','D'           3
'A','B','P','R'           2
'O','Y','C','D'           1
'O','Y','CL','lD'         4

I want to make a nested list, based on the first 2 elements. the first element is the first 2 letters and the rest is 2 other letter and counts column. For example, for the above data the result should be:

[
    [
        ['A','B'],
        [['C','D'],5],
        [['K','D'],3],
        ['P','R'],2]
    ],
    [
        ['O','Y'],
        [['C','D'],1],
        [['CL','lD'],4]
    ]
]

The following code does exactly what I want, but it is too slow. How can I make it faster?

pairs=[]
trans=[]
for i in range(df3.shape[0]):
    if df3['2pair'].values[i].split(',')[:2] not in trans:
        trans.append(df3['2pair'].values[i].split(',')[:2])
        sub=[]
        sub.append(df3['2pair'].values[i].split(',')[:2])
        for j in range(df3.shape[0]):
            if df3['2pair'].values[i].split(',')[:2]==df3['2pair'].values[j].split(',')[:2]:
                sub.append([df3['2pair'].values[j].split(',')[2:],df3['counts'].values[j]])
        pairs.append(sub)

CodePudding user response：

Here's one way using str.split to split the strings in 2pair column; then use groupby.apply to_dict to create the lists:

df[['head', 'tail']] = [[(*x[:2],), x[2:]] for x in df['2pair'].str.split(',')]
out = [[[*k]]   v for k,v in (df.groupby('head')[['tail','counts']]
                              .apply(lambda x: x.to_numpy().tolist()).to_dict()
                              .items())]

Output:

[[['A', 'B'], [['C', 'D'], 5], [['K', 'D'], 3], [['P', 'R'], 2]],
 [['O', 'Y'], [['C', 'D'], 1], [['CL', 'lD'], 4]]]