I am having a hard time knowing how to even formulate this question, but this is what I am trying to accomplish:
I have a pandas datatable with thousands of rows that look like this:
df = pd.read_excel("data.xlsx")
id | text | value1 | value2 |
---|---|---|---|
1 | These are the | True | False |
2 | Values of "value1" | True | False |
3 | While these others | False | True |
4 | are the Values of "value2" | False | True |
How can I group by column name all the cells that met a condition while joining the cells that met the condition in a single cell to get a table that looks like this?
values | merge_text |
---|---|
value1 | These are the Values of "value1" |
value2 | While these others are the Values of "value2" |
I was thinking that to solve this, first I need to split the table into multiple tables containing the values that met the condition of a single column and then merge all the tables together.
v1 = df[['id', 'text', 'value1']]
v1 = v1[v1["value1"]==True]
id | text | value1 |
---|---|---|
1 | These are the | True |
2 | Values of "value1" | True |
v2 = df[['id', 'text', 'value2']]
v2 = v2[v2["value2"]==True]
id | text | value2 |
---|---|---|
3 | While these others | True |
4 | are the Values of "value2" | True |
What I do not know, and have failed to find the answer online, is how to merge the cells like this:
values | merge_text |
---|---|
value1 | These are the Values of "value1" |
CodePudding user response:
You could set_index
with "id" and "text"; then stack
df
. Then (i) filter the Series by itself; (ii) groupby
"value" and join
"text":
s = df.set_index(['id','text']).stack()
out = s[s].reset_index(level=1).groupby(level=1)['text'].apply(' '.join).reset_index()
Output:
index text
0 value1 These are the Values of "value1"
1 value2 While these others are the Values of "value2"