Home > Enterprise >  Replicate contains string on row and move with specific result to new column Pandas python
Replicate contains string on row and move with specific result to new column Pandas python

Time:07-09

I need to move a specific selected value of row to a new column with selected only value, what method that I need to do this? For example, I need to get only the value chosen that contains 'KOTA|KAB' to a new column, from this case, what

I have some data:

 --------------------------------------------- 
|                     A                       |
 --------------------------------------------- 
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   |
 --------------------------------------------- 

I need to get specific value only and move to a new column, what I expect is look like this

 --------------------------------------------- ---------------------- 
|                     A                       |           B          |
 --------------------------------------------- ---------------------- 
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 | KOTA SURABAYA        |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      | KAB SIDOARJO         |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   | KAB BANYUWANGI       |
 --------------------------------------------- ---------------------- 

What I tried:

# initialize list of lists
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'], 
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'], 
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
  
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
  
for check in df_test['A']:
    test = re.sub(r'(\bKOTA\b)|(\bKAB\b)', '', check)
    print(test)

But the result above will remove the KOTA and KAB

CodePudding user response:

Using re.sub removed the text that is matched with the pattern from the string.

You can use a single capture group with str.extract for example:

testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]

# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])

df_test['B'] = df_test["A"].str.extract(r'\b((?:KOTA|KAB) \w )')
print (df_test)

Output

                                             A               B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI

CodePudding user response:

Assuming you want to extract KOTA/KAB and the following words (except digits), you can use:

df_test['B'] = df_test['A'].str.extract(r'(\b(?:KOTA|KAB)\b\D \b)')

output:

                                             A                B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA 
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO 
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI 
  • Related