I have a DF that looks like this.
My desired output is a DF that looks like this:
Here's the code I've tried:
test = df.groupby('Group', as_index=True).agg(lambda x: '; '.join(el for el in x if el !=''))
But this gives me the below.
When I use (per this answer):
test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(el for el in x if el !='').set(x))
I get an error reading:
AttributeError: 'str' object has no attribute 'set'
What am I doing wrong? How can the code be fixed to only include one of each value?
CodePudding user response:
You are nearly there, you have to apply set
to the list of values to join:
test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(set(el for el in x if el !='')))
CodePudding user response:
You are using the set incorrectly, also take advantage of the set difference:
test = out.groupby('DocSetID', as_index=True).agg(lambda x: '; '.join(set(x).difference([''])))
CodePudding user response:
Though I'm sure there is a more elegant way to solve this problem, here is a slight change to your code based on the other answer you supplied. You used .set()
as a method of your output string. Instead you can apply it as a function to the list within the .join()
import pandas as pd
import numpy as np
df = pd.DataFrame({"Group": [1, 1], "Val1": ["A", "A"], "Val2": ["B", ""], "Val3": ["C", "D"]})
df.groupby("Group").agg(lambda x: '; '.join(sorted(set(el for el in x if el !=''))))
Note that this does not handle nulls -- only empty strings, you'd have to add that condition in the if statement.