In Pandas, how to group by column name and condition met, while joining the cells that met the condi-CodePudding

I am having a hard time knowing how to even formulate this question, but this is what I am trying to accomplish:

I have a pandas datatable with thousands of rows that look like this:

df = pd.read_excel("data.xlsx")

id	text	value1	value2
1	These are the	True	False
2	Values of "value1"	True	False
3	While these others	False	True
4	are the Values of "value2"	False	True

How can I group by column name all the cells that met a condition while joining the cells that met the condition in a single cell to get a table that looks like this?

values	merge_text
value1	These are the Values of "value1"
value2	While these others are the Values of "value2"

I was thinking that to solve this, first I need to split the table into multiple tables containing the values that met the condition of a single column and then merge all the tables together.

v1 = df[['id', 'text', 'value1']]
v1 = v1[v1["value1"]==True]

id	text	value1
1	These are the	True
2	Values of "value1"	True

v2 = df[['id', 'text', 'value2']]
v2 = v2[v2["value2"]==True]

id	text	value2
3	While these others	True
4	are the Values of "value2"	True

What I do not know, and have failed to find the answer online, is how to merge the cells like this:

values	merge_text
value1	These are the Values of "value1"

CodePudding user response：

You could set_index with "id" and "text"; then stack df. Then (i) filter the Series by itself; (ii) groupby "value" and join "text":

s = df.set_index(['id','text']).stack()
out = s[s].reset_index(level=1).groupby(level=1)['text'].apply(' '.join).reset_index()

Output:

    index                                           text
0  value1               These are the Values of "value1"
1  value2  While these others are the Values of "value2"