String Compression in Python-CodePudding

I have the following input :

 my_list = ["x d1","y d1","z d2","t d2"]

And would like to transform it into :

Expected_result = ["d1(x,y)","d2(z,t)"]

I had to use brute force, and also had to call pandas to my rescue, since I didn't find any way to do it in plain/vanilla python. Do you have any other way to solve this?

import pandas as pd 

my_list = ["x d1","y d1","z d2","t d2"]

df = pd.DataFrame(my_list,columns=["col1"])

df2 = df["col1"].str.split(" ",expand = True)
df2.columns = ["col1","col2"]
grp = df2.groupby(["col2"])

result = []
for grp_name, data in grp:
  res =  grp_name  "("   ",".join(list(data["col1"]))   ")"
  result.append(res)
print(result)

CodePudding user response：

If your values are already sorted by key (d1, d2), you can use itertools.groupby:

from itertools import groupby

out = [f"{k}({','.join(x[0] for x in g)})"
       for k, g in groupby(map(str.split, my_list), lambda x: x[1])]

Output:

['d1(x,y)', 'd2(z,t)']

Otherwise you should use a dictionary as shown by @Jamiu.

A variant of your pandas solution:

out = (df['col1'].str.split(n=1, expand=True)
       .groupby(1)[0]
       .apply(lambda g: f"{g.name}({','.join(g)})")
       .tolist()
      )

CodePudding user response：

Here is one approach

result = {}

for item in my_list:
    key, value = item.split()
    result.setdefault(value, []).append(key)
    
output = [f"{k}({', '.join(v)})" for k, v in result.items()]
print(output)

['d1(x, y)', 'd2(z, t)']

CodePudding user response：

my_list = ["x d1","y d1","z d2","t d2"]
res = []
 
for item in my_list:

    a, b, *_ = item.split()
 
    if len(res) and b in res[-1]:
            res[-1] = res[-1].replace(')', f',{a})')
    else:
        res.append(f'{b}({a})')

print(res)
['d1(x,y)', 'd2(z,t)']

Let N be the number that follows d, this code works for any number of elements within dN, as long as N is ordered, that is, d1 comes before d2, which comes before d3, ... Works with any value of N , and you can use any letter in the d link as long as it has whatever value is in dN and then dN, keeping that order, "val_in_dN dN"

If you need something that works even if the dN are not in sequence, just say the word, but it will cost a little more

CodePudding user response：

import itertools as it

my_list = [e.split(' ') for e in ["x d1","y d1","z d2","t d2"]]

r=[]
for key, group in it.groupby(my_list, lambda x: x[1]):
    l=[e[0] for e in list(group)]
    r.append("{0}({1},{2})".format(key, l[0], l[1]))

print(r)

Output :

['d1(x,y)', 'd2(z,t)']

CodePudding user response：

Another possible solution, which is based on pandas:

(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
 .groupby('a')['b'].apply(lambda x: f'({x.values[0]}, {x.values[1]})')
 .reset_index().sum(axis=1).tolist())

Output:

['d1(x, y)', 'd2(z, t)']