Home > database >  How to find an alphabet and extract the alphabet and the number tagged along with it in Pandas?
How to find an alphabet and extract the alphabet and the number tagged along with it in Pandas?

Time:11-23

I would like to create a new column in the data frame that will search for the alphabet in a column. Based on it, it will then search for the next number and copy the alphabet and number into newly extracted column. Example:

Month Sem_Year
2020-04-01 H1 2020
2020-05-01 2020 H1
2020-06-01 H1 2020
2020-07-01 H2 2020
2020-08-01 H2 2020
2020-09-01 2020 H2
2020-10-01 2020 H2
2020-11-01 H2 2020
2020-12-01 H2 2020
2021-01-01 H1 2021
2021-02-01 H1 2021

Now I want to search for the alphabet H in the second column and extract the alphabet and number tagged along with it. Example:

Month Sem_Year Sem
2020-04-01 H1 2020 H1
2020-05-01 2020 H1 H1
2020-06-01 H1 2020 H1
2020-07-01 H2 2020 H2
2020-08-01 H2 2020 H2
2020-09-01 2020 H2 H2
2020-10-01 2020 H2 H2
2020-11-01 H2 2020 H2
2020-12-01 H2 2020 H2
2021-01-01 H1 2021 H1
2021-02-01 H1 2021 H1

CodePudding user response:

You can use df.insert() to add a new column. For extracting the alphabet, loop through the values (column_value) in the second column and use "value_for_new_column=column_value.split(' ')[0]"

CodePudding user response:

For the varied formats you have defined you need to use a Regex expression. Note that H\d means H followed by a digit. This regex could be modified for other requirements.

df['Sem'] = df['Sem_year'].str.extract("(H\d)")
  • Related