Home > Blockchain >  Pandas - How to split a string column into several columns, by the index of specific characters?
Pandas - How to split a string column into several columns, by the index of specific characters?

Time:12-22

I want to extract the user ID from a string column called "filename", and create a new ID column, based on the index of specific character in the original string.

Two examples for the string in "filename", with ID of 2 or 3 digits:

  1. filename = ID100session1neg_emotions_rating.csv ---> ID = 100

  2. filename =ID21session2neu_emotions_rating.csv ---> ID = 21

I tried this -

df['ID '] = df.filename.str[2:**4**]

but I couldn't find the end index of the ID for the slice per row (it 3 or 4, depends on the length of the ID as 2 or 3 digits).

finding the index of "s" after each row in the data frame will solve my problem. The simple option didn't work for me -

s_index = df.filename.str.index("s")

(I also tried some split option, but I don't have a specific character such as comma, to split by)

Thanks a lot! sorry if it's a duplication of a previous question

CodePudding user response:

I'd use regex with str.extract:

s_index = df.filename.str.extract("^ID(\d )")

As integers:

s_index = df.filename.str.extract("^ID(\d )").astype(int)

Regex101 explanation

CodePudding user response:

An alternative to regex which is probably the best answer is to use split first on 'session' and grabbing the very first element and then another split grabbing the last element:

df['ID'] =  df.filename.str.split('session').str[0].str.split('ID').str[1]
  • Related