Pandas - How to split a string column into several columns, by the index of specific characters?-CodePudding

I want to extract the user ID from a string column called "filename", and create a new ID column, based on the index of specific character in the original string.

Two examples for the string in "filename", with ID of 2 or 3 digits:

filename = ID100session1neg_emotions_rating.csv ---> ID = 100
filename =ID21session2neu_emotions_rating.csv ---> ID = 21

I tried this -

df['ID '] = df.filename.str[2:**4**]

but I couldn't find the end index of the ID for the slice per row (it 3 or 4, depends on the length of the ID as 2 or 3 digits).

finding the index of "s" after each row in the data frame will solve my problem. The simple option didn't work for me -

s_index = df.filename.str.index("s")

(I also tried some split option, but I don't have a specific character such as comma, to split by)

Thanks a lot! sorry if it's a duplication of a previous question

CodePudding user response：

I'd use regex with str.extract:

s_index = df.filename.str.extract("^ID(\d )")

As integers:

s_index = df.filename.str.extract("^ID(\d )").astype(int)

Regex101 explanation

CodePudding user response：

An alternative to regex which is probably the best answer is to use split first on 'session' and grabbing the very first element and then another split grabbing the last element:

df['ID'] =  df.filename.str.split('session').str[0].str.split('ID').str[1]