Feature | Example | ES2018 | ES2024+ | Subfeatures & JS differences | |
---|---|---|---|---|---|
Flags | Supported in top-level flags and pattern modifiers | ||||
Ignore case | i |
✅ | ✅ |
✔ Unicode case folding (same as JS with flag u , v ) |
|
Dot all | m |
✅ | ✅ |
✔ Equivalent to JS flag s |
|
Extended | x |
✅ | ✅ |
✔ Unicode whitespace ignored ✔ Line comments with # ✔ Whitespace/comments allowed between a token and its quantifier ✔ Whitespace/comments between a quantifier and the ? /+ that makes it lazy/possessive changes it to a quantifier chain✔ Whitespace/comments separate tokens (ex: \1 0 )✔ Whitespace and # not ignored in char classes |
|
Currently supported only in top-level flags | |||||
Digit is ASCII | D |
✅ | ✅ |
✔ ASCII \d , \p{Digit} , [[:digit:]] |
|
Space is ASCII | S |
✅ | ✅ |
✔ ASCII \s , \p{Space} , [[:space:]] |
|
Word is ASCII | W |
✅ | ✅ |
✔ ASCII \b , \w , \p{Word} , [[:word:]] |
|
Pattern modifiers | Group | (?im-x:…) |
✅ | ✅ |
✔ Unicode case folding for i ✔ Allows enabling and disabling the same flag (priority: disable) ✔ Allows lone or multiple - |
Directive | (?im-x) |
✅ | ✅ |
✔ Continues until end of pattern or group (spanning alternatives) |
|
Characters | Literal | E , ! |
✅ | ✅ |
✔ Code point based matching (same as JS with flag u , v )✔ Standalone ] , { , } don't require escaping |
Identity escape | \E , \! |
✅ | ✅ |
✔ Different set than JS ✔ Allows multibyte chars |
|
Escaped metachar | \\ , \. |
✅ | ✅ |
✔ Same as JS |
|
Control code escape | \t |
✅ | ✅ |
✔ The JS set plus \a , \e |
|
\xNN |
\x7F |
✅ | ✅ |
✔ Allows 1 hex digit ✔ Above 7F , is UTF-8 encoded byte (≠ JS)✔ Error for invalid encoded bytes |
|
\uNNNN |
\uFFFF |
✅ | ✅ |
✔ Same as JS with flag u , v |
|
\x{…} |
\x{A} |
✅ | ✅ |
✔ Allows leading 0s up to 8 total hex digits |
|
Escaped num | \20 |
✅ | ✅ |
✔ Can be backref, error, null, octal, identity escape, or any of these combined with literal digits, based on complex rules that differ from JS ✔ Always handles escaped single digit 1-9 outside char class as backref ✔ Allows null with 1-3 0s ✔ Error for octal > 177 |
|
Caret notation | \cA , \C-A |
✅ | ✅ |
✔ With A-Za-z (JS: only \c form) |
|
Character sets | Digit | \d , \D |
✅ | ✅ |
✔ Unicode by default (≠ JS) |
Hex digit | \h , \H |
✅ | ✅ |
✔ ASCII |
|
Whitespace | \s , \S |
✅ | ✅ |
✔ Unicode by default ✔ No JS adjustments to Unicode set (− \uFEFF , +\x85 ) |
|
Word | \w , \W |
✅ | ✅ |
✔ Unicode by default (≠ JS) |
|
Dot | . |
✅ | ✅ |
✔ Excludes only \n (≠ JS) |
|
Any | \O |
✅ | ✅ |
✔ Any char (with any flags) ✔ Identity escape in char class |
|
Not newline | \N |
✅ | ✅ |
✔ Identity escape in char class |
|
Unicode property |
\p{L} ,\P{L}
|
✅ | ✅ |
✔ Binary properties ✔ Categories ✔ Scripts ✔ Aliases ✔ POSIX properties ✔ Invert with \p{^…} , \P{^…} ✔ Insignificant spaces, hyphens, underscores, and casing in names ✔ \p , \P without { is an identity escape✔ Error for key prefixes ✔ Error for props of strings ❌ Blocks (wontfix[1]) |
|
Variable-length sets | Newline | \R |
✅ | ✅ |
✔ Matched atomically |
Grapheme | \X |
☑️ | ☑️ |
● Uses a close approximation ✔ Matched atomically |
|
Character classes | Base | […] , [^…] |
✅ | ✅ |
✔ Unescaped - outside of range is literal in some contexts (different than JS rules in any mode)✔ Error for unescaped [ that doesn't form nested class✔ Leading unescaped ] OK✔ Fewer chars require escaping than JS |
Empty | [] , [^] |
✅ | ✅ |
✔ Error |
|
Range | [a-z] |
✅ | ✅ |
✔ Same as JS with flag u , v ✔ Allows \x{…} above 10FFFF at end of range to mean last valid code point |
|
POSIX class |
[[:word:]] ,[[:^word:]]
|
☑️[2] | ✅ |
✔ All use Unicode definitions |
|
Nested class | […[…]] |
☑️[3] | ✅ |
✔ Same as JS with flag v |
|
Intersection | […&&…] |
❌ | ✅ |
✔ Doesn't require nested classes for intersection of union and ranges ✔ Allows empty segments |
|
Assertions | Line start, end | ^ , $ |
✅ | ✅ |
✔ Always "multiline" ✔ Only \n as newline✔ ^ doesn't match after string-terminating \n |
String start, end | \A , \z |
✅ | ✅ |
✔ Same as JS ^ $ without JS flag m |
|
String end or before terminating newline | \Z |
✅ | ✅ |
✔ Only \n as newline |
|
Search start | \G |
✅ | ✅ |
✔ Matches at start of match attempt (not end of prev match; advances after 0-length match) |
|
Lookaround |
(?=…) ,(?!…) ,(?<=…) ,(?<!…)
|
✅ | ✅ |
✔ Allows variable-length quantifiers and alternation within lookbehind ✔ Lookahead invalid within lookbehind ✔ Capturing groups invalid within negative lookbehind ✔ Negative lookbehind invalid within positive lookbehind |
|
Word boundary | \b , \B |
✅ | ✅ |
✔ Unicode based (≠ JS) |
|
Quantifiers | Greedy, lazy | * , +? , {2,} , etc. |
✅ | ✅ |
✔ Includes all JS forms ✔ Adds {,n} for min 0✔ Explicit bounds have upper limit of 100,000 (unlimited in JS) ✔ Error with assertions (same as JS with flag u , v ) and directives |
Possessive | ?+ , *+ , ++ , {3,2} |
✅ | ✅ |
✔ + suffix doesn't make {…} quantifiers possessive (creates a quantifier chain)✔ Reversed {…} ranges are possessive |
|
Chained | ** , ??+* , {2,3}+ , etc. |
✅ | ✅ |
✔ Further repeats the preceding repetition |
|
Groups | Noncapturing | (?:…) |
✅ | ✅ |
✔ Same as JS |
Atomic | (?>…) |
✅ | ✅ |
✔ Supported |
|
Capturing | (…) |
✅ | ✅ |
✔ Is noncapturing if named capture present |
|
Named capturing |
(?<a>…) ,(?'a'…)
|
✅ | ✅ |
✔ Duplicate names allowed (including within the same alternation path) unless directly referenced by a subroutine ✔ Error for names invalid in Oniguruma (more permissive than JS) |
|
Backreferences | Numbered | \1 |
✅ | ✅ |
✔ Error if named capture used ✔ Refs the most recent of a capture/subroutine set |
Enclosed numbered, relative |
\k<1> ,\k'1' ,\k<-1> ,\k'-1'
|
✅ | ✅ |
✔ Error if named capture used ✔ Allows leading 0s ✔ Refs the most recent of a capture/subroutine set ✔ \k without < ' is an identity escape |
|
Named |
\k<a> ,\k'a'
|
✅ | ✅ |
✔ For duplicate group names, rematch any of their matches (multiplex) ✔ Refs the most recent of a capture/subroutine set (no multiplex) ✔ Combination of multiplex and most recent of capture/subroutine set if duplicate name is indirectly created by a subroutine ✔ Error for - /+ in backref names, though valid in group names |
|
To nonparticipating groups | ☑️ | ☑️ |
✔ Error if group to the right[4] ✔ Duplicate names (and subroutines) to the right not included in multiplex ✔ Fail to match (or don't include in multiplex) ancestor groups and groups in preceding alternation paths ❌ Some rare cases are indeterminable at compile time and use the JS behavior of matching an empty string |
||
Subroutines | Numbered, relative |
\g<1> ,\g'1' ,\g<-1> ,\g'-1' ,\g<+1> ,\g'+1'
|
✅ | ✅ |
✔ Error if named capture used ✔ Allows leading 0s All subroutines (incl. named): ✔ Allowed before reffed group ✔ Can be nested (any depth) ✔ Reuses flags from the reffed group (ignores local flags) ✔ Replaces most recent captured values (for backrefs) ✔ \g without < ' is an identity escape |
Named |
\g<a> ,\g'a'
|
✅ | ✅ |
● Same behavior as numbered ✔ Error if reffed group uses duplicate name |
|
Recursion | Full pattern |
\g<0> ,\g'0'
|
☑️[5] | ☑️[5] |
✔ 20-level depth limit |
Numbered, relative, named |
(…\g<1>?…) ,(…\g<-1>?…) ,(?<a>…\g<a>?…) , etc.
|
☑️[5] | ☑️[5] |
✔ 20-level depth limit |
|
Other | Comment group | (?#…) |
✅ | ✅ |
✔ Allows escaping \) , \\ ✔ Comments allowed between a token and its quantifier ✔ Comments between a quantifier and the ? /+ that makes it lazy/possessive changes it to a quantifier chain |
Alternation | …|… |
✅ | ✅ |
✔ Same as JS |
|
Absent repeater[6] | (?~…) |
✅ | ✅ |
✔ Supported |
|
Keep | \K |
☑️ | ☑️ |
● Supported at top level if no top-level alternation is used |
|
JS features unknown to Oniguruma are handled using Oniguruma syntax | ✅ | ✅ |
✔ \u{…} is an error✔ [\q{…}] matches q , etc.✔ [a--b] includes the invalid reversed range a to - |
||
Invalid Oniguruma syntax | ✅ | ✅ |
✔ Error |
||
Compile-time options | ONIG_OPTION_CAPTURE_GROUP |
✅ | ✅ |
✔ Unnamed captures and numbered calls allowed when using named capture |
|
ONIG_OPTION_SINGLELINE |
✅ | ✅ |
✔ ^ → \A ✔ $ → \Z |