My dataset is as the example below:
Index ID
0 1.4A
1 1.4D
2 5B
3 D6C
4 ZG67A
5 ZG67C
I want to add a "-" before the last position of the values in my column. The values don't have a consitent lenght, therefore I cannot choose a position to place the - between, as in this helpful post
One good solution in the related post is to use pd-Series.str and to chose a position
df['ID'.str[2:] "-" df["c"].str[4:]
I somehow need to address the position before the last letter in every row in my column['ID']. Later I want to apply split, but as far as I understood split, it needs a delimiter to split.
Best Outcome:
Index ID
0 1.4-A
1 1.4-D
2 5-B
3 D6-C
4 ZG67-A
5 ZG67-C
Thanks
CodePudding user response:
Try:
df["ID"] = df["ID"].str.replace(r"(.*)([A-Z] )$", r"\1-\2", regex=True)
print(df)
Prints:
Index ID
0 0 1.4-A
1 1 1.4-D
2 2 5-B
3 3 D6-C
4 4 ZG67-A
5 5 ZG67-C
CodePudding user response:
you can reference positions relative to the end of a string using negative indices, just like normal list or string indexing:
df['ID'].str[:-1] "-" df["ID"].str[-1:]
If you're hoping to split out the last character in each string, you could use a regular expression to match exactly one character before the end - no delimiter needed:
In [9]: df.ID.str.split(r'(?=.$)', regex=True)
Out[9]:
Index
0 [1.4, A]
1 [1.4, D]
2 [5, B]
3 [D6, C]
4 [ZG67, A]
5 [ZG67, C]
Name: ID, dtype: object
CodePudding user response:
Using a regex to match the position before the last character (using a lookahead):
df['ID'] = df['ID'].str.replace(r'(?=.$)', '-', regex=True)
output (as new column ID2 for comparison):
Index ID ID2
0 0 1.4A 1.4-A
1 1 1.4D 1.4-D
2 2 5B 5-B
3 3 D6C D6-C
4 4 ZG67A ZG67-A
5 5 ZG67C ZG67-C