I need to move a specific selected value of row to a new column with selected only value, what method that I need to do this? For example, I need to get only the value chosen that contains 'KOTA|KAB'
to a new column, from this case, what
I have some data:
---------------------------------------------
| A |
---------------------------------------------
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274 |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471 |
---------------------------------------------
I need to get specific value only and move to a new column, what I expect is look like this
--------------------------------------------- ----------------------
| A | B |
--------------------------------------------- ----------------------
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 | KOTA SURABAYA |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274 | KAB SIDOARJO |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471 | KAB BANYUWANGI |
--------------------------------------------- ----------------------
What I tried:
# initialize list of lists
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
for check in df_test['A']:
test = re.sub(r'(\bKOTA\b)|(\bKAB\b)', '', check)
print(test)
But the result above will remove the KOTA and KAB
CodePudding user response:
Using re.sub
removed the text that is matched with the pattern from the string.
You can use a single capture group with str.extract
for example:
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
df_test['B'] = df_test["A"].str.extract(r'\b((?:KOTA|KAB) \w )')
print (df_test)
Output
A B
0 JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 KOTA SURABAYA
1 JL XXXX, KEC PORONG KAB SIDOARJO 61274 KAB SIDOARJO
2 DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471 KAB BANYUWANGI
CodePudding user response:
Assuming you want to extract KOTA/KAB and the following words (except digits), you can use:
df_test['B'] = df_test['A'].str.extract(r'(\b(?:KOTA|KAB)\b\D \b)')
output:
A B
0 JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 KOTA SURABAYA
1 JL XXXX, KEC PORONG KAB SIDOARJO 61274 KAB SIDOARJO
2 DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471 KAB BANYUWANGI