Home > Back-end >  pandas groupby join data with"|" but exclude null,nan,emptystring,whitespace
pandas groupby join data with"|" but exclude null,nan,emptystring,whitespace

Time:10-28

here is the sample data for my question

pid date task language
8522 17-09-2022 listen english
7851 17-09-2022 read hindi
7546 17-09-2022 telugu
3584 write
8522 18-09-2022 read
7851 19-09-2022 hindi
8522 speak hindi
7546 19-09-2022 speak english
3584 write tamil
8522 25-09-2022

i tried to code and got the output

data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))

and i got this output

image1:-output given by the code

and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output

image2:-my final output

any answer or suggestion is helpful

CodePudding user response:

You can use pandas.Series.dropna in the lambda expression before joining.

Here is what works for me:

out = (
    df
    .groupby('pid')[['date','task','language']]
    .agg(lambda x: '|'.join(x
                            .replace(r'^\s*$', np.nan, regex=True)
                            .dropna()
                            .astype(str)
                           )
        )
)
print(out)
                                  date               task        language
pid                                                                      
3584                                          write|write           tamil
7546             17-09-2022|19-09-2022              speak  telugu|english
7851             17-09-2022|19-09-2022               read     hindi|hindi
8522  17-09-2022|18-09-2022|25-09-2022  listen|read|speak   english|hindi
  • Related