Home > Blockchain >  New column based on existing string column in Python
New column based on existing string column in Python

Time:06-02

My dataframe looks like:

School Term Students
A summer 2020 324
B spring 21 101
A summer/spring 201
F wintersem 44
C fall trimester 98
E 23

I need to add a new column Termcode that assumes any of the 6 values: summer, spring, fall, winter, multiple, none based on corresponding value in the Term Column, viz:

School Term Students Termcode
A summer 2020 324 summer
B spring 21 101 spring
A summer/spring 201 multiple
F wintersem 44 winter
C fall trimester 98 fall
E 23 none

CodePudding user response:

You can use a regex with str.extractall and filling of the values depending on the number of matches:

terms = ['summer', 'spring', 'fall', 'winter']
regex = r'(' '|'.join(terms) r')'
# '(summer|spring|fall|winter)'

# extract values and set up grouper for next step
g = df['Term'].str.extractall(regex)[0].groupby(level=0)

# get the first match, replace with "multiple" if more than one
df['Termcode'] = g.first().mask(g.nunique().gt(1), 'multiple')

# fill the missing data (i.e. no match) with "none"
df['Termcode'] = df['Termcode'].fillna('none')

output:

  School            Term  Students  Termcode
0      A     summer 2020       324    summer
1      B       spring 21       101    spring
2      A   summer/spring       201  multiple
3      F       wintersem        44    winter
4      C  fall trimester        98      fall
5      E             NaN        23      none

CodePudding user response:

Series.findall

l = ['summer', 'spring', 'fall', 'winter']

s = df['Term'].str.findall(fr"{'|'.join(l)}")
df['Termcode'] = np.where(s.str.len() > 1, 'multiple', s.str[0])

  School            Term  Students  Termcode
0      A     summer 2020       324    summer
1      B       spring 21       101    spring
2      A   summer/spring       201  multiple
3      F       wintersem        44    winter
4      C  fall trimester        98      fall
5      E             NaN        23       NaN
  • Related