Pandas create column counter that resets when the length exceeds a certain value-CodePudding

I have the following dataframe:

       A      B  
0   john    doe   
1  jacob  smith 
2   juli  patel  
3  jason bourne
4  alan  turing

I want to create a new column C where it is the concatenation of columns A and B and a counter starting from 1. However, if the total length of A B Counter exceeds 11, then I want to reset the counter back to 1.

So for the above dataframe, the column C would be:

       A      B    C
0   john    doe   johndoe1
1  jacob  smith   jacobsmith2
2   juli  patel   julipatel3
3  jason bourne   jasonbourne1
4  alan  turing   alanturing2

CodePudding user response：

import pandas as pd

data = {
    "A": ["john", "jacob", "juli", "jason", "alan"],
    "B": ["doe", "smith", "patel", "bourne", "turing"],
}

counter = 0


def update_trans(row):
    global counter
    length = len(row["A"]   row["B"])
    if length >= 11:
        counter = 1
    else:
        counter  = 1
    return f"{row['A']}{row['B']}{str(counter)}"


if __name__ == "__main__":
    df = pd.DataFrame(data)
    df["C"] = df.apply(update_trans, axis=1)
    print(df.head())

See if this helps...

Since this task seemed basic, leveraging apply function might be alright.

CodePudding user response：

Here is another way:

s = df['A']   df['B']
s   df.groupby((s.str.len()).ge(11).cumsum()).cumcount().add(1).astype(str)