here is the sample data for my question
pid | date | task | language |
---|---|---|---|
8522 | 17-09-2022 | listen | english |
7851 | 17-09-2022 | read | hindi |
7546 | 17-09-2022 | telugu | |
3584 | write | ||
8522 | 18-09-2022 | read | |
7851 | 19-09-2022 | hindi | |
8522 | speak | hindi | |
7546 | 19-09-2022 | speak | english |
3584 | write | tamil | |
8522 | 25-09-2022 |
i tried to code and got the output
data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))
and i got this output
and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output
any answer or suggestion is helpful
CodePudding user response:
You can use pandas.Series.dropna
in the lambda expression before joining.
Here is what works for me:
out = (
df
.groupby('pid')[['date','task','language']]
.agg(lambda x: '|'.join(x
.replace(r'^\s*$', np.nan, regex=True)
.dropna()
.astype(str)
)
)
)
print(out)
date task language
pid
3584 write|write tamil
7546 17-09-2022|19-09-2022 speak telugu|english
7851 17-09-2022|19-09-2022 read hindi|hindi
8522 17-09-2022|18-09-2022|25-09-2022 listen|read|speak english|hindi