I have a data set as below
Table name : Data_table_1
Customer_ID | Cus_activity | Cus_Amount |
---|---|---|
12985 | CURAC | 201 |
20917 | PINT$@CURAC$@CINT$@CURCM | 567$@291$@358$@434 |
548357 | CINT$@CURCM$@CURAC | 300$@934$@450 |
80315 | CURCM$@PINT$@CURAC$@CINT | 809$@345$@420$@567 |
234561 | CURAC$@CURCM$@PINT$@CINT | 500$@359$@200$@400 |
Output I need is as below
Customer_ID | Cus_activity | Cus_Amount | CURAC_amount |
---|---|---|---|
12985 | CURAC | 201 | 201 |
20917 | PINT$@CURAC$@CINT$@CURCM | 567$@291$@358$@434 | 291 |
548357 | CINT$@CURCM$@CURAC | 300$@934$@450 | 450 |
80315 | CURCM$@PINT$@CURAC$@CINT | 809$@345$@420$@567 | 420 |
234561 | CURAC$@CURCM$@PINT$@CINT | 500$@359$@200$@400 | 500 |
The amount related to cus_activity capture in the Cus_Amount column
I need to get only the "CURAC" amount to a separate column
I tried as below but its incorrect
SELECT REGEXP_SUBSTR(Cus_activity ,'[^CURAC]*$')
How to get it once from a large data set only the "CURAC" amount to a separate column?
CodePudding user response:
Here's one option; basically, you'd split cus_activity
and cus_amount
to rows and then return only values that belong to CURAC
.
SQL> with
2 temp as
3 (select
4 customer_id, cus_activity, cus_amount,
5 --
6 rtrim(regexp_substr(cus_activity, '[^@] ', 1, column_value), '$') act,
7 rtrim(regexp_substr(cus_amount, '[^@] ', 1, column_value), '$') amt
8 from test cross join
9 table(cast(multiset(select level from dual
10 connect by level <= regexp_count(cus_activity, '@') 1
11 ) as sys.odcinumberlist))
12 )
13 select
14 customer_id, cus_activity, cus_amount,
15 max(case when act = 'CURAC' then amt end) curac_amount
16 from temp
17 group by customer_id, cus_activity, cus_amount
18 order by customer_id;
CUSTOMER_ID CUS_ACTIVITY CUS_AMOUNT CURAC_AMOUNT
----------- ------------------------ ------------------ ------------------
12985 CURAC 201 201
20917 PINT$@CURAC$@CINT$@CURCM 567$@291$@358$@434 291
80315 CURCM$@PINT$@CURAC$@CINT 809$@345$@420$@567 420
234561 CURAC$@CURCM$@PINT$@CINT 500$@359$@200$@400 500
548357 CINT$@CURCM$@CURAC 300$@934$@450 450
SQL>
CodePudding user response:
You do not need to split the rows (as that will be slow) and can find the position using simple string functions:
SELECT customer_id,
cus_activity,
cus_amount,
SUBSTR(
amt,
INSTR(amt, '$@', 1, position) 2,
INSTR(amt, '$@', 1, position 1) - INSTR(amt, '$@', 1, position) - 2
) AS curac_amount
FROM (
SELECT d.*,
COALESCE((LENGTH(act) - LENGTH(REPLACE(act, '$@')))/2, 0) 1
AS position
FROM (
SELECT d.*,
SUBSTR(
'$@' || cus_activity || '$@',
1,
INSTR('$@' || cus_activity || '$@', '$@CURAC$@') - 1
) AS act,
CASE
WHEN INSTR('$@' || cus_activity || '$@', '$@CURAC$@') > 0
THEN '$@' || cus_amount || '$@'
END AS amt
FROM data_table_1 d
) d
)
If you did want to use a slower method of splitting the data into rows then, from Oracle 12, you can use:
SELECT *
FROM data_table_1 d
LEFT OUTER JOIN LATERAL(
SELECT REGEXP_SUBSTR(cus_amount || '$@', '(.*?)\$@', 1, LEVEL, NULL, 1)
AS curac_amount
FROM DUAL
WHERE REGEXP_SUBSTR(cus_activity || '$@', '(.*?)\$@', 1, LEVEL, NULL, 1)
= 'CURAC'
CONNECT BY
LEVEL <= REGEXP_COUNT(cus_activity || '$@', '(.*?)\$@')
)
ON (1 = 1)
Which, for the sample data:
CREATE TABLE data_table_1 (Customer_ID, Cus_activity, Cus_Amount) AS
SELECT 12985, 'CURAC', '201' FROM DUAL UNION ALL
SELECT 20917, 'PINT$@CURAC$@CINT$@CURCM', '567$@291$@358$@434' FROM DUAL UNION ALL
SELECT 548357, 'CINT$@CURCM$@CURAC', '300$@934$@450' FROM DUAL UNION ALL
SELECT 80315, 'CURCM$@PINT$@CURAC$@CINT', '809$@345$@420$@567' FROM DUAL UNION ALL
SELECT 234561, 'CURAC$@CURCM$@PINT$@CINT', '500$@359$@200$@400' FROM DUAL UNION ALL
SELECT 234562, 'NOCURAC$@CURCM$@PINT$@CINT', '500$@359$@200$@400' FROM DUAL;
Both output:
CUSTOMER_ID CUS_ACTIVITY CUS_AMOUNT CURAC_AMOUNT 12985 CURAC 201 201 20917 PINT$@CURAC$@CINT$@CURCM 567$@291$@358$@434 291 548357 CINT$@CURCM$@CURAC 300$@934$@450 450 80315 CURCM$@PINT$@CURAC$@CINT 809$@345$@420$@567 420 234561 CURAC$@CURCM$@PINT$@CINT 500$@359$@200$@400 500 234562 NOCURAC$@CURCM$@PINT$@CINT 500$@359$@200$@400 null
db<>fiddle here