Python Lambda Join Function to Return Semi Colon List of Unique Values Giving Attribute Error-CodePudding

I have a DF that looks like this.

My desired output is a DF that looks like this:

Here's the code I've tried:

test = df.groupby('Group', as_index=True).agg(lambda x: '; '.join(el for el in x if el !=''))

But this gives me the below.

When I use (per this answer):

test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(el for el in x if el !='').set(x))

I get an error reading:

AttributeError: 'str' object has no attribute 'set'

What am I doing wrong? How can the code be fixed to only include one of each value?

CodePudding user response：

You are nearly there, you have to apply set to the list of values to join:

test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(set(el for el in x if el !='')))

CodePudding user response：

You are using the set incorrectly, also take advantage of the set difference:

test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(set(x).difference([''])))

CodePudding user response：

Though I'm sure there is a more elegant way to solve this problem, here is a slight change to your code based on the other answer you supplied. You used .set() as a method of your output string. Instead you can apply it as a function to the list within the .join()

import pandas as pd
import numpy as np
df = pd.DataFrame({"Group": [1, 1], "Val1": ["A", "A"], "Val2": ["B", ""], "Val3": ["C", "D"]})
df.groupby("Group").agg(lambda x: '; '.join(sorted(set(el for el in x if el !=''))))

Note that this does not handle nulls -- only empty strings, you'd have to add that condition in the if statement.