Home > Mobile >  In Pandas, how to group by column name and condition met, while joining the cells that met the condi
In Pandas, how to group by column name and condition met, while joining the cells that met the condi

Time:02-15

I am having a hard time knowing how to even formulate this question, but this is what I am trying to accomplish:

I have a pandas datatable with thousands of rows that look like this:

df = pd.read_excel("data.xlsx")
id text value1 value2
1 These are the True False
2 Values of "value1" True False
3 While these others False True
4 are the Values of "value2" False True

How can I group by column name all the cells that met a condition while joining the cells that met the condition in a single cell to get a table that looks like this?

values merge_text
value1 These are the Values of "value1"
value2 While these others are the Values of "value2"

I was thinking that to solve this, first I need to split the table into multiple tables containing the values that met the condition of a single column and then merge all the tables together.

v1 = df[['id', 'text', 'value1']]
v1 = v1[v1["value1"]==True]
id text value1
1 These are the True
2 Values of "value1" True
v2 = df[['id', 'text', 'value2']]
v2 = v2[v2["value2"]==True]
id text value2
3 While these others True
4 are the Values of "value2" True

What I do not know, and have failed to find the answer online, is how to merge the cells like this:

values merge_text
value1 These are the Values of "value1"

CodePudding user response:

You could set_index with "id" and "text"; then stack df. Then (i) filter the Series by itself; (ii) groupby "value" and join "text":

s = df.set_index(['id','text']).stack()
out = s[s].reset_index(level=1).groupby(level=1)['text'].apply(' '.join).reset_index()

Output:

    index                                           text
0  value1               These are the Values of "value1"
1  value2  While these others are the Values of "value2"
  • Related