Home > Enterprise >  Add column with numbers based on count of value in other column in Pandas
Add column with numbers based on count of value in other column in Pandas

Time:10-19

colA is what I currently have.

However, I'm trying to generate colB.

I want colB to contain the number 001 for each value. However if the associated colA value exists twice in that column, I want the colB number to then be 002, and so on.

Hopefully the example below gives a better idea of what I'm looking for based on the colA values. I've been struggling to put together any real code for this.

EDIT: Struggling to explain this in words, so if you can think of a better way to explain it feel free to update my question.

colA   colB
BJ02   001
BJ02   002
CJ02   001
CJ03   001
CJ02   002
DJ01   001
DJ02   001
DJ07   001
DJ07   002
DJ07   003

CodePudding user response:

You can use Counter() to count the frequency of each value in colA, then create a function to generate a list of values for colB.

from collections import Counter    
def count_value(colA):
     new_col = []
     colA = df[colA].tolist()
     freq_table = Counter(colA) # count the frequency of each value
     for value in colA:
          new_col.append('00'   str(freq_table[value]))
     return new_col
df['colB'] = count_value(df['colA'])

CodePudding user response:

Use groupby_cumcount:

df['colB'] = df.groupby('colA').cumcount().add(1)
print(df)

# Output
   colA  colB
0  BJ02     1
1  BJ02     2
2  CJ02     1
3  CJ03     1
4  CJ02     2
5  DJ01     1
6  DJ02     1
7  DJ07     1
8  DJ07     2
9  DJ07     3

Suggested by @HenryEcker, use zfill:

df['colB'] = df.groupby('colA').cumcount().add(1).astype(str).str.zfill(3)
print(df)

# Output:
   colA colB
0  BJ02  001
1  BJ02  002
2  CJ02  001
3  CJ03  001
4  CJ02  002
5  DJ01  001
6  DJ02  001
7  DJ07  001
8  DJ07  002
9  DJ07  003
  • Related