colA
is what I currently have.
However, I'm trying to generate colB
.
I want colB
to contain the number 001
for each value. However if the associated colA
value exists twice in that column, I want the colB
number to then be 002
, and so on.
Hopefully the example below gives a better idea of what I'm looking for based on the colA
values. I've been struggling to put together any real code for this.
EDIT: Struggling to explain this in words, so if you can think of a better way to explain it feel free to update my question.
colA colB
BJ02 001
BJ02 002
CJ02 001
CJ03 001
CJ02 002
DJ01 001
DJ02 001
DJ07 001
DJ07 002
DJ07 003
CodePudding user response:
You can use Counter() to count the frequency of each value in colA, then create a function to generate a list of values for colB.
from collections import Counter
def count_value(colA):
new_col = []
colA = df[colA].tolist()
freq_table = Counter(colA) # count the frequency of each value
for value in colA:
new_col.append('00' str(freq_table[value]))
return new_col
df['colB'] = count_value(df['colA'])
CodePudding user response:
Use groupby_cumcount
:
df['colB'] = df.groupby('colA').cumcount().add(1)
print(df)
# Output
colA colB
0 BJ02 1
1 BJ02 2
2 CJ02 1
3 CJ03 1
4 CJ02 2
5 DJ01 1
6 DJ02 1
7 DJ07 1
8 DJ07 2
9 DJ07 3
Suggested by @HenryEcker, use zfill
:
df['colB'] = df.groupby('colA').cumcount().add(1).astype(str).str.zfill(3)
print(df)
# Output:
colA colB
0 BJ02 001
1 BJ02 002
2 CJ02 001
3 CJ03 001
4 CJ02 002
5 DJ01 001
6 DJ02 001
7 DJ07 001
8 DJ07 002
9 DJ07 003