How to find an alphabet and extract the alphabet and the number tagged along with it in Pandas?-CodePudding

I would like to create a new column in the data frame that will search for the alphabet in a column. Based on it, it will then search for the next number and copy the alphabet and number into newly extracted column. Example:

Month	Sem_Year
2020-04-01	H1 2020
2020-05-01	2020 H1
2020-06-01	H1 2020
2020-07-01	H2 2020
2020-08-01	H2 2020
2020-09-01	2020 H2
2020-10-01	2020 H2
2020-11-01	H2 2020
2020-12-01	H2 2020
2021-01-01	H1 2021
2021-02-01	H1 2021

Now I want to search for the alphabet H in the second column and extract the alphabet and number tagged along with it. Example:

Month	Sem_Year	Sem
2020-04-01	H1 2020	H1
2020-05-01	2020 H1	H1
2020-06-01	H1 2020	H1
2020-07-01	H2 2020	H2
2020-08-01	H2 2020	H2
2020-09-01	2020 H2	H2
2020-10-01	2020 H2	H2
2020-11-01	H2 2020	H2
2020-12-01	H2 2020	H2
2021-01-01	H1 2021	H1
2021-02-01	H1 2021	H1

CodePudding user response：

You can use df.insert() to add a new column. For extracting the alphabet, loop through the values (column_value) in the second column and use "value_for_new_column=column_value.split(' ')[0]"

CodePudding user response：

For the varied formats you have defined you need to use a Regex expression. Note that H\d means H followed by a digit. This regex could be modified for other requirements.

df['Sem'] = df['Sem_year'].str.extract("(H\d)")