I just wondering if there is any other way I can extract the year from a column and assign two new columns to it where one column is for season and one for year?
I tried this method and it seems to work, but only work for year and selected rows:
year = df['premiered'].str.findall('(\d{4})').str.get(0)
df1 = df.assign(year = year.values)
Output:
|premiered||year|
|----------||---|
|Spring 1998||1998|
|Spring 2001||2001|
|Fall 2016||NaN|
|Fall 2016||NaN|
CodePudding user response:
Use Series.str.split
with the expand
option:
expand
: Expand the split strings into separate columns.
df[['season', 'year']] = df['premiered'].str.split(expand=True)
# premiered season year
# 0 Spring 1998 Spring 1998
# 1 Spring 2001 Spring 2001
# 2 Fall 2016 Fall 2016
# 3 Fall 2016 Fall 2016
Or use Series.str.extract
with a regex:
(\w )
-- capture 1 word characters\s*
-- 0 whitespaces(\d )
-- capture 1 digits
df[['season', 'year']] = df['premiered'].str.extract('(\w )\s*(\d )')
# premiered season year
# 0 Spring 1998 Spring 1998
# 1 Spring 2001 Spring 2001
# 2 Fall 2016 Fall 2016
# 3 Fall 2016 Fall 2016
Also it would be a good idea to convert the new year
column to numeric:
df['year'] = df['year'].astype(int)
CodePudding user response:
You could use a split function
data = { 'premiered' : ['Spring 1998', 'Spring 2001', 'Fall 2016', 'Fall 2016']}
df = pd.DataFrame(data)
df['year'] = df['premiered'].apply(lambda x : x.split(' ')[1])
df