I would like to create a new column in the data frame that will search for the alphabet in a column. Based on it, it will then search for the next number and copy the alphabet and number into newly extracted column. Example:
Month | Sem_Year |
---|---|
2020-04-01 | H1 2020 |
2020-05-01 | 2020 H1 |
2020-06-01 | H1 2020 |
2020-07-01 | H2 2020 |
2020-08-01 | H2 2020 |
2020-09-01 | 2020 H2 |
2020-10-01 | 2020 H2 |
2020-11-01 | H2 2020 |
2020-12-01 | H2 2020 |
2021-01-01 | H1 2021 |
2021-02-01 | H1 2021 |
Now I want to search for the alphabet H in the second column and extract the alphabet and number tagged along with it. Example:
Month | Sem_Year | Sem |
---|---|---|
2020-04-01 | H1 2020 | H1 |
2020-05-01 | 2020 H1 | H1 |
2020-06-01 | H1 2020 | H1 |
2020-07-01 | H2 2020 | H2 |
2020-08-01 | H2 2020 | H2 |
2020-09-01 | 2020 H2 | H2 |
2020-10-01 | 2020 H2 | H2 |
2020-11-01 | H2 2020 | H2 |
2020-12-01 | H2 2020 | H2 |
2021-01-01 | H1 2021 | H1 |
2021-02-01 | H1 2021 | H1 |
CodePudding user response:
You can use df.insert() to add a new column. For extracting the alphabet, loop through the values (column_value) in the second column and use "value_for_new_column=column_value.split(' ')[0]"
CodePudding user response:
For the varied formats you have defined you need to use a Regex expression. Note that H\d
means H followed by a digit. This regex could be modified for other requirements.
df['Sem'] = df['Sem_year'].str.extract("(H\d)")