Pandas dataframe how to add "-" at position before last letter in a column-CodePudding

My dataset is as the example below:

Index ID
0     1.4A
1     1.4D
2     5B
3     D6C
4     ZG67A   
5     ZG67C

I want to add a "-" before the last position of the values in my column. The values don't have a consitent lenght, therefore I cannot choose a position to place the - between, as in this helpful post

One good solution in the related post is to use pd-Series.str and to chose a position

df['ID'.str[2:] "-" df["c"].str[4:]

I somehow need to address the position before the last letter in every row in my column['ID']. Later I want to apply split, but as far as I understood split, it needs a delimiter to split.

Best Outcome:

Index ID
0     1.4-A
1     1.4-D
2     5-B
3     D6-C
4     ZG67-A   
5     ZG67-C

Thanks

CodePudding user response：

Try:

df["ID"] = df["ID"].str.replace(r"(.*)([A-Z] )$", r"\1-\2", regex=True)
print(df)

Prints:

   Index      ID
0      0   1.4-A
1      1   1.4-D
2      2     5-B
3      3    D6-C
4      4  ZG67-A
5      5  ZG67-C

CodePudding user response：

you can reference positions relative to the end of a string using negative indices, just like normal list or string indexing:

df['ID'].str[:-1]   "-"   df["ID"].str[-1:]

If you're hoping to split out the last character in each string, you could use a regular expression to match exactly one character before the end - no delimiter needed:

In [9]: df.ID.str.split(r'(?=.$)', regex=True)
Out[9]:
Index
0     [1.4, A]
1     [1.4, D]
2       [5, B]
3      [D6, C]
4    [ZG67, A]
5    [ZG67, C]
Name: ID, dtype: object

CodePudding user response：

Using a regex to match the position before the last character (using a lookahead):

df['ID'] = df['ID'].str.replace(r'(?=.$)', '-', regex=True)

output (as new column ID2 for comparison):

   Index     ID     ID2
0      0   1.4A   1.4-A
1      1   1.4D   1.4-D
2      2     5B     5-B
3      3    D6C    D6-C
4      4  ZG67A  ZG67-A
5      5  ZG67C  ZG67-C