Quantcast
Viewing all articles
Browse latest Browse all 10

Answer by Andrew Wolfe for Why doesn't a non-greedy quantifier sometimes work in Oracle regex?

You've got a really great bounty, so I'm going to try to nail it comprehensively.

You make assumptions in your regular expression handling that are incorrect.

  1. Oracle is NOT compatible with Perl regular expressions, it iscompatible with POSIX. It describes its support for Perl as"Perl-Influenced"
  2. There is an intrinsic syntax conflict around the use of the Perl "*?" in Oracle, if youread that reference the way I do, and Oracle legitimately chooses the POSIX usage
  3. Your description of how perl handles "*?" is not quite right.

Here is a mashup of the options we've discussed. The key to this issue is around case 30

    CASE    SRC                             TEXT               RE                FROM_WHOM                                          RESULT            ------- ------------------------------- ------------------ ----------------- -------------------------------------------------- --------------          1 Egor's original source string   A=1,B=2,C=3,       .*B=.*?,          Egor's original pattern "doesn't work"             A=1,B=2,C=3,            2 Egor's original source string   A=1,B=2,C=3,       .*B=.?,           Egor's "works correctly"                           A=1,B=2,                3 Egor's original source string   A=1,B=2,C=3,       .*B=.+?,          Old Pro comment 1 form 2                           A=1,B=2,                4 Egor's original source string   A=1,B=2,C=3,       .+B=.*?,          Old Pro comment 1 form 1                           A=1,B=2,                5 Egor's original source string   A=1,B=2,C=3,       .*B=.{0,}?,       Old Pro comment 2                                  A=1,B=2,                6 Egor's original source string   A=1,B=2,C=3,       [^B]*B=[^Bx]*?,   Old Pro answer form 1 "good"                       A=1,B=2,                7 Egor's original source string   A=1,B=2,C=3,       [^B]*B=[^B]*?,    Old Pro answer form 2 "bad"                        A=1,B=2,C=3,            8 Egor's original source string   A=1,B=2,C=3,       (.)*B=(.)*?,      TBone answer form 1                                A=1,B=2,                9 TBone answer example 2          1_@_2_a_3_@_4_a    (\w)*?@(\w)*      TBone answer example 2 form 1                      1_@_2_a_3_             10 TBone answer example 2          1_@_2_a_3_@_4_a    (\w)*@(\w)*?      TBone answer example 2 form 2                      1_@                    30 Egor's original source string   A=1,B=2,C=3,       .*B=(.)*?,        Schemaczar Variant to force Perl operation         A=1,B=2,               31 Egor's original source string   A=1,B=2,C=3,       .*B=(.*)?,        Schemaczar Variant of Egor to force POSIX          A=1,B=2,C=3,           32 Egor's original source string   A=1,B=2,C=3,       .*B=.*{0,1}       Schemaczar Applying Egor's  'non-greedy'           A=1,B=2,C=3,           33 Egor's original source string   A=1,B=2,C=3,       .*B=(.)*{0,1}     Schemaczar Another variant of Egor's "non-greedy"  A=1,B=2,C=3,  

I am pretty sure that CASE 30 is what you thought you were writing - that is, you thought the "*?" had a stronger association than the "*" by itself. True for Perl, I guess, but for Oracle (and presumably canonical POSIX) RE's, the "*?" has a lower precedence and associativity than "*". So Oracle reads it as "(.*)?" (case 31) whereas Perl reads it as "(.)*?", that is, case 30.

Note cases 32 and 33 indicate that "*{0,1}" does not work like "*?".

Note that Oracle REGEXP does not work like LIKE, that is, it does not require the match pattern to cover the entire test string. Using the "^" begin and "$" end markers might help you with this as well.

My script:

SET SERVEROUTPUT ON<<DISCREET_DROP>> begin  DBMS_OUTPUT.ENABLE;  for dropit in (select 'DROP TABLE ' || TABLE_NAME || ' CASCADE CONSTRAINTS' AS SYNT  FROM TABS WHERE TABLE_NAME IN ('TEST_PATS', 'TEST_STRINGS')  )  LOOP    DBMS_OUTPUT.PUT_LINE('Dropping via ' || dropit.synt);    execute immediate dropit.synt;  END LOOP;END DISCREET_DROP;/----------------------------------------------------------  DDL for Table TEST_PATS--------------------------------------------------------  CREATE TABLE TEST_PATS    (    RE VARCHAR2(2000),   FROM_WHOM VARCHAR2(50),   PAT_GROUP VARCHAR2(50),   PAT_ORDER NUMBER(9,0)   ) ;/----------------------------------------------------------  DDL for Table TEST_STRINGS--------------------------------------------------------  CREATE TABLE TEST_STRINGS    (    TEXT VARCHAR2(2000),   SRC VARCHAR2(200),   TEXT_GROUP VARCHAR2(50),   TEXT_ORDER NUMBER(9,0)   ) ;/----------------------------------------------------------  DDL for View REGEXP_TESTER_V--------------------------------------------------------  CREATE OR REPLACE FORCE VIEW REGEXP_TESTER_V (CASE_NUMBER, SRC, TEXT, RE, FROM_WHOM, RESULT) AS   select pat_order as case_number,  src, text, re, from_whom,   regexp_substr (text, re) as resultfrom test_pats full outer join test_strings on (text_group = pat_group)order by pat_order, text_order;/REM INSERTING into TEST_PATSSET DEFINE OFF;Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.*?,','Egor''s original pattern "doesn''t work"','Egor',1);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.?,','Egor''s "works correctly"','Egor',2);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.)*?,','Schemaczar Variant to force Perl operation','Egor',30);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.*)?,','Schemaczar Variant of Egor to force POSIX','Egor',31);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.*{0,1}','Schemaczar Applying Egor''s  ''non-greedy''','Egor',32);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.)*{0,1}','Schemaczar Another variant of Egor''s "non-greedy"','Egor',33);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('[^B]*B=[^Bx]*?,','Old Pro answer form 1 "good"','Egor',6);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('[^B]*B=[^B]*?,','Old Pro answer form 2 "bad"','Egor',7);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.+?,','Old Pro comment 1 form 2','Egor',3);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.{0,}?,','Old Pro comment 2','Egor',5);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.+B=.*?,','Old Pro comment 1 form 1','Egor',4);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(.)*B=(.)*?,','TBone answer form 1','Egor',8);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(\w)*?@(\w)*','TBone answer example 2 form 1','TBone',9);Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(\w)*@(\w)*?','TBone answer example 2 form 2','TBone',10);REM INSERTING into TEST_STRINGSSET DEFINE OFF;Insert into TEST_STRINGS (TEXT,SRC,TEXT_GROUP,TEXT_ORDER) values ('A=1,B=2,C=3,','Egor''s original source string','Egor',1);Insert into TEST_STRINGS (TEXT,SRC,TEXT_GROUP,TEXT_ORDER) values ('1_@_2_a_3_@_4_a','TBone answer example 2','TBone',2);COLUMN SRC FORMAT A50 WORD_WRAPCOLUMN TEXT  FORMAT A50 WORD_WRAPCOLUMN RE FORMAT A50 WORD_WRAPCOLUMN FROM_WHOM FORMAT A50 WORD_WRAPCOLUMN RESULT  FORMAT A50 WORD_WRAPSELECT * FROM REGEXP_TESTER_V;

Viewing all articles
Browse latest Browse all 10

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>