I have a question that I don't know how to approach and solve. I have a dataframe with two columns, Text and Binary. I want to generate a third one that will concatenate the strings from the Text cells in a range of 3 "True" values above and below the current row by including the false values in that range.
This is what I would like to have at the end:
Text | Binary | Generated |
---|---|---|
ABC | True | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ |
EFG | True | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE |
KLM | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE |
OPQ | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE |
FGA | True | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK |
KLA | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK |
UTF | True | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
GBD | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
XXX | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
KLJ | False | ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
OPR | True | EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
ABB | False | EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
DAE | False | EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
QWE | True | FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
RTE | False | FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
LLK | False | FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
LMN | True | UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN |
Any help is appreciated. Thank You
CodePudding user response:
First create an auxiliary Series:
cs = df.Binary.cumsum()
Then, to generte your new column, run:
df['Generated'] = cs.apply(lambda n: df.Text[cs.between(n-3, n 3)].str.cat(sep=' '))
By default Pandas imposes some limit on the width of each column, so "ordinary" printing will not show full content.
But if you run:
with pd.option_context("display.max_colwidth", 70):
print(df)
you will get something like:
Text Binary Generated
0 ABC True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ
1 EFG True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
2 KLM False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
3 OPQ False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
4 FGA True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK
5 KLA False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK
6 UTF True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
7 GBD False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
8 XXX False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
9 KLJ False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
10 OPR True EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
11 ABB False EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
12 DAE False EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
13 QWE True FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
14 RTE False FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
15 LLK False FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
16 LMN True UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
Actually, for readability, I changed the column alignment to "left".
Edit
If you want yet shorter code, you can shorten it to the following one-liner:
df['Generated'] = (cs := df.Binary.cumsum()).apply(
lambda n: df.Text[cs.between(n-3, n 3)].str.cat(sep=' '))
For readability I folded it into 2 lines, but it is a single instruction.