Given the dataframe below:
Row|Address |Contact |
--- ------------------------------------------------------------------------ -------------
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City 08822 722-922 / 726-667|null |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City 08822 725-301 |null |
|3 |R.N. Abejuela St., Cagayan de Oro City |08822 727-864|
How do i convert it into this:
Row|Address |Contact |
--- ------------------------------------------------------------------------ -------------
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City | 08822 722-922 / 726-667 |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City | 08822 725-301 |
|3 |R.N. Abejuela St., Cagayan de Oro City | 08822 727-864 |
CodePudding user response:
First use regular expressions to extract Contact_tmp
from Address
, then erase it from Address
, and finally use coalesce
function to merge Contact
and Contact_tmp
.
df = df.withColumn('Contact_tmp', F.regexp_extract('Address', '\\d \\s \\d -\\d \\s*/*\\s*\\d*-*\\d*', 0)).select(
'Row',
F.expr('replace(Address, Contact_tmp, "")').alias('Address'),
F.coalesce('Contact', 'Contact_tmp').alias('Contact')
)
df.show(truncate=False)