Home > Software design >  Concatenate text from cells belonging to a range of "True" values
Concatenate text from cells belonging to a range of "True" values

Time:04-19

I have a question that I don't know how to approach and solve. I have a dataframe with two columns, Text and Binary. I want to generate a third one that will concatenate the strings from the Text cells in a range of 3 "True" values above and below the current row by including the false values in that range.

This is what I would like to have at the end:

Text Binary Generated
ABC True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ
EFG True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
KLM False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
OPQ False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE
FGA True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK
KLA False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK
UTF True ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
GBD False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
XXX False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
KLJ False ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
OPR True EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
ABB False EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
DAE False EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
QWE True FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
RTE False FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
LLK False FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN
LMN True UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN

Any help is appreciated. Thank You

CodePudding user response:

First create an auxiliary Series:

cs = df.Binary.cumsum()

Then, to generte your new column, run:

df['Generated'] = cs.apply(lambda n: df.Text[cs.between(n-3, n 3)].str.cat(sep=' '))

By default Pandas imposes some limit on the width of each column, so "ordinary" printing will not show full content.

But if you run:

with pd.option_context("display.max_colwidth", 70):
    print(df)

you will get something like:

    Text  Binary  Generated
0   ABC   True    ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ 
1   EFG   True    ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE 
2   KLM   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE 
3   OPQ   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE 
4   FGA   True    ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK 
5   KLA   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK 
6   UTF   True    ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
7   GBD   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
8   XXX   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
9   KLJ   False   ABC EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
10  OPR   True    EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
11  ABB   False   EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
12  DAE   False   EFG KLM OPQ FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
13  QWE   True    FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
14  RTE   False   FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
15  LLK   False   FGA KLA UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 
16  LMN   True    UTF GBD XXX KLJ OPR ABB DAE QWE RTE LLK LMN 

Actually, for readability, I changed the column alignment to "left".

Edit

If you want yet shorter code, you can shorten it to the following one-liner:

df['Generated'] = (cs := df.Binary.cumsum()).apply(
    lambda n: df.Text[cs.between(n-3, n 3)].str.cat(sep=' '))

For readability I folded it into 2 lines, but it is a single instruction.

  • Related