How do I create a new column with incrementing alphanumeric counter based on another column with pan-CodePudding

Using Python/Pandas (of whatever is best way in python):

What if you want to increment a new column with values such as A1,A2,A3... in the rows (increasing on each row) based on if values are repeating in an existing column such that if the repeating in the existing column ends, column 2 jumps to B1,B2,B3... for the next repeating value (note if next value is not repeating it would simply be B1 and then C1, etc)?

In other words, I have the existing column below and would like to simply add the new column as such:

Existing Column	New Column
12	A1
12	A2
12	A3
13	B1
13	B2
14	C1
15	D1
15	D2

and so forth....

I tried stuff like this which is not really close because

a) it does not start and stop counting based on existing column's values - it simply goes to the end with a single count. b) there is another incrementing or keeping track involved because when one count is done A1,A2,A3 - the next count should start with B (B1, B2, B3) and this means the letter one is one needs to be kept track of for the new column.

list = []

nid = df['note_id']

for i in range(1, df.shape[0]   1):   # gets the length of the DataFrame. 

if nid[df[i]] == nid[df[i 1]]:
list.append(f'A{i}') 

df['ID'] = list

CodePudding user response：

Update

Use get_column_letter from openpyxl package if you have more than 26 letters:

from openpyxl.utils import get_column_letter

df['New Column'] = df['Existing Column'].ne(df['Existing Column'].shift()) \
                                        .cumsum().map(get_column_letter)

df['New Column']  = df.groupby('New Column').cumcount().add(1).astype(str)
print(df)

# Output
   Existing Column New Column
0               12         A1
1               12         A2
2               12         A3
3               13         B1
4               13         B2
5               14         C1
6               15         D1
7               15         D2

Old answer

Quick and dirty but it works only if you have less than 26 different values in Existing Column.

df['New Column'] = pd.Series(pd.factorize(df['Existing Column'])[0]).add(65).map(chr)

df['New Column']  = df.groupby('New Column').cumcount().add(1).astype(str)
print(df)

# Output
   Existing Column New Column
0               12         A1
1               12         A2
2               12         A3
3               13         B1
4               13         B2
5               14         C1
6               15         D1
7               15         D2

CodePudding user response：

Have you tried looking at pandas rank? You could group by the existing column and then do a rank(method='first'). Then your existing column and the ranked column together could serve as the new column you want.