I have a data frame like this:
2pair counts
'A','B','C','D' 5
'A','B','K','D' 3
'A','B','P','R' 2
'O','Y','C','D' 1
'O','Y','CL','lD' 4
I want to make a nested list, based on the first 2 elements. the first element is the first 2 letters and the rest is 2 other letter and counts column. For example, for the above data the result should be:
[
[
['A','B'],
[['C','D'],5],
[['K','D'],3],
['P','R'],2]
],
[
['O','Y'],
[['C','D'],1],
[['CL','lD'],4]
]
]
The following code does exactly what I want, but it is too slow. How can I make it faster?
pairs=[]
trans=[]
for i in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2] not in trans:
trans.append(df3['2pair'].values[i].split(',')[:2])
sub=[]
sub.append(df3['2pair'].values[i].split(',')[:2])
for j in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2]==df3['2pair'].values[j].split(',')[:2]:
sub.append([df3['2pair'].values[j].split(',')[2:],df3['counts'].values[j]])
pairs.append(sub)
CodePudding user response:
Here's one way using str.split
to split the strings in 2pair
column; then use groupby.apply
to_dict
to create the lists:
df[['head', 'tail']] = [[(*x[:2],), x[2:]] for x in df['2pair'].str.split(',')]
out = [[[*k]] v for k,v in (df.groupby('head')[['tail','counts']]
.apply(lambda x: x.to_numpy().tolist()).to_dict()
.items())]
Output:
[[['A', 'B'], [['C', 'D'], 5], [['K', 'D'], 3], [['P', 'R'], 2]],
[['O', 'Y'], [['C', 'D'], 1], [['CL', 'lD'], 4]]]