Home > OS >  str.slice command in pandas unable to select desired part of string
str.slice command in pandas unable to select desired part of string

Time:11-16

I have the following dataframe in pandas:

d = {'Student Name': ['Omar 17BE004', '17BE005 Hussain', '17BE006 Anwar Syed']}
df_test = pd.DataFrame(data=d)
df_test.head(3)

I am trying to create a new column called Student_ID which will consist of the part of the string in the Student Name column representing student ID like 17BE004 in the first row. For this I am using the following code:

df_test['Indices'] =df_test['Student Name'].str.find('1')
start=df_test.Indices
stop=start 7
myList_2=list(range(3))


for x in myList_2:
    df_test['Student ID']=df_test['Student Name'].str.slice(start[x], stop[x],1)


However, the output I get in the student ID column are: Omar 17, 17BE005, 17BE006

The result in the first row of student ID column is Omar 17 when I only want the student ID which is 17BE004. It seems str.slice command is unable to slice the correct student ID from the student name if there are unwanted strings in front of the desired string. Like the name Omar in front of desired student ID 17BE004. Can anyone tell me how I can get a proper column of student ID?

CodePudding user response:

Use str.extract() for this

d = {'Student Name': ['Omar 17BE004', '17BE005 Hussain', '17BE006 Anwar Syed']}
df_test = pd.DataFrame(data=d)
df_test['Student ID'] = df_test['Student Name'].str.extract(r'(\b1\w{6})')
print(df_test)
         Student Name Student ID
0        Omar 17BE004    17BE004
1     17BE005 Hussain    17BE005
2  17BE006 Anwar Syed    17BE006
  • Related