I want to extract the user ID from a string column called "filename", and create a new ID column, based on the index of specific character in the original string.
Two examples for the string in "filename", with ID of 2 or 3 digits:
filename = ID100session1neg_emotions_rating.csv ---> ID = 100
filename =ID21session2neu_emotions_rating.csv ---> ID = 21
I tried this -
df['ID '] = df.filename.str[2:**4**]
but I couldn't find the end index of the ID for the slice per row (it 3 or 4, depends on the length of the ID as 2 or 3 digits).
finding the index of "s" after each row in the data frame will solve my problem. The simple option didn't work for me -
s_index = df.filename.str.index("s")
(I also tried some split option, but I don't have a specific character such as comma, to split by)
Thanks a lot! sorry if it's a duplication of a previous question
CodePudding user response:
I'd use regex with str.extract
:
s_index = df.filename.str.extract("^ID(\d )")
As integers:
s_index = df.filename.str.extract("^ID(\d )").astype(int)
CodePudding user response:
An alternative to regex which is probably the best answer is to use split
first on 'session' and grabbing the very first element and then another split
grabbing the last element:
df['ID'] = df.filename.str.split('session').str[0].str.split('ID').str[1]