full site update

This commit is contained in:
2025-07-24 18:46:24 +02:00
parent bfe2b90d8d
commit 37a6e0ab31
6912 changed files with 540482 additions and 361712 deletions

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2024 Steven Levithan
Copyright (c) 2025 Steven Levithan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -1,10 +1,14 @@
# regex-recursion
[![npm version][npm-version-src]][npm-version-href]
[![npm downloads][npm-downloads-src]][npm-downloads-href]
[![bundle][bundle-src]][bundle-href]
This is an official plugin for [Regex+](https://github.com/slevithan/regex) that adds support for recursive matching up to a specified max depth *N*, where *N* can be between 2 and 100. Generated regexes are native JavaScript `RegExp` instances.
> [!NOTE]
> Regex flavors vary on whether they offer infinite or fixed-depth recursion. For example, recursion in Oniguruma uses a depth limit of 20, and doesn't allow changing this.
Recursive matching is added to a regex via one of the following (the recursion depth limit is provided in place of *`N`*):
- `(?R=N)` — Recursively match the entire regex at this position.
@@ -30,8 +34,8 @@ const re = regex({plugins: [recursion]})`…`;
<summary>Using a global name (no import)</summary>
```html
<script src="https://cdn.jsdelivr.net/npm/regex@5.1.1/dist/regex.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/regex-recursion@5.1.1/dist/regex-recursion.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/regex@6.0.1/dist/regex.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/regex-recursion@6.0.2/dist/regex-recursion.min.js"></script>
<script>
const {regex} = Regex;
const {recursion} = Regex.plugins;
@@ -48,35 +52,30 @@ const re = regex({plugins: [recursion]})`…`;
#### Anywhere within a string
```js
// Matches sequences of up to 50 'a' chars followed by the same number of 'b'
const re = regex({plugins: [recursion]})`a(?R=50)?b`;
// Matches sequences of up to 20 'a' chars followed by the same number of 'b'
const re = regex({plugins: [recursion]})`a(?R=20)?b`;
re.exec('test aaaaaabbb')[0];
// → 'aaabbb'
```
#### As the entire string
Use `\g<name&R=N>` to recursively match just the specified group.
```js
const re = regex({plugins: [recursion]})`^
(?<balanced>
a
# Recursively match just the specified group
\g<balanced&R=50>?
b
)
$`;
const re = regex({plugins: [recursion]})`
^ (?<r> a \g<r&R=20>? b) $
`;
re.test('aaabbb'); // → true
re.test('aaabb'); // → false
```
Notice the `^` and `$` anchors outside of the recursive subpattern.
### Match balanced parentheses
```js
// Matches all balanced parentheses up to depth 50
// Matches all balanced parentheses up to depth 20
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ( [^\(\)] | (?R=50) )* \)
\( ([^\(\)] | (?R=20))* \)
`;
'test ) (balanced ((parens))) () ((a)) ( (b)'.match(parens);
@@ -88,17 +87,24 @@ const parens = regex({flags: 'g', plugins: [recursion]})`
] */
```
Following is an alternative that matches the same strings, but adds a nested quantifier. It then uses an atomic group to prevent this nested quantifier from creating the potential for [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html).
Following is an alternative that matches the same strings, but adds a nested quantifier. It then uses an atomic group to prevent this nested quantifier from creating the potential for [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html). Since the example above doesn't need a nested quantifier, this is not an improvement but merely an alternative that shows how to deal with the general problem of nested quantifiers with multiple ways to divide matches of the same strings.
```js
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ( (?> [^\(\)]+ ) | (?R=50) )* \)
\( ((?> [^\(\)]+) | (?R=20))* \)
`;
// Or with a possessive quantifier
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ([^\(\)]++ | (?R=20))* \)
`;
```
This matches sequences of non-parens in one step with the nested `+` quantifier, and avoids backtracking into these sequences by wrapping it with an atomic group `(?>…)`. Given that what the nested quantifier `+` matches overlaps with what the outer group can match with its `*` quantifier, the atomic group is important here. It avoids exponential backtracking when matching long strings with unbalanced parens.
The first example above matches sequences of non-parentheses in one step with the nested `+` quantifier, and avoids backtracking into these sequences by wrapping it with an atomic group `(?>…)`. Given that what the nested quantifier `+` matches overlaps with what the outer group can match with its `*` quantifier, the atomic group is important here. It avoids exponential backtracking when matching long strings with unbalanced parentheses.
[Atomic groups](https://github.com/slevithan/regex#atomic-groups) are provided by the base `regex` library.
In cases where you're you're repeating a single token within an atomic group, possessive quantifiers provide syntax sugar.
Atomic groups and possessive quantifiers are provided by the base Regex+ library.
### Match palindromes
@@ -106,9 +112,9 @@ This matches sequences of non-parens in one step with the nested `+` quantifier,
```js
const palindromes = regex({flags: 'gi', plugins: [recursion]})`
(?<char> \w )
(?<char> \w)
# Recurse, or match a lone unbalanced char in the middle
( (?R=15) | \w? )
((?R=15) | \w?)
\k<char>
`;
@@ -116,28 +122,30 @@ const palindromes = regex({flags: 'gi', plugins: [recursion]})`
// → ['Racecar', 'ABBA', 'edivide']
```
In the example above, the max length of matched palindromes is 31. That's because it sets the max recursion depth to 15 with `(?R=15)`. So, depth 15 × 2 chars (left + right) for each depth level + 1 optional unbalanced char in the middle = 31. To match longer palindromes, the max recursion depth can be increased to a max of 100, which would enable matching palindromes up to 201 characters long.
Palindromes are sequences that read the same backwards as forwards. In the example above, the max length of matched palindromes is 31. That's because it sets the max recursion depth to 15 with `(?R=15)`. So, depth 15 × 2 chars (left + right) for each depth level + 1 optional unbalanced char in the middle = 31. To match longer palindromes, the max recursion depth can be increased to a max of 100, which would enable matching palindromes up to 201 characters long.
#### Match palindromes as complete words
```js
const palindromeWords = regex({flags: 'gi', plugins: [recursion]})`\b
const palindromeWords = regex({flags: 'gi', plugins: [recursion]})`
\b
(?<palindrome>
(?<char> \w )
( \g<palindrome&R=15> | \w? )
(?<char> \w)
(\g<palindrome&R=15> | \w?)
\k<char>
)
\b`;
\b
`;
'Racecar, ABBA, and redivided'.match(palindromeWords);
// → ['Racecar', 'ABBA']
```
Notice the `\b` word boundaries outside of the recursive subpattern.
<!-- Badges -->
[npm-version-src]: https://img.shields.io/npm/v/regex-recursion?color=78C372
[npm-version-href]: https://npmjs.com/package/regex-recursion
[npm-downloads-src]: https://img.shields.io/npm/dm/regex-recursion?color=78C372
[npm-downloads-href]: https://npmjs.com/package/regex-recursion
[bundle-src]: https://img.shields.io/bundlejs/size/regex-recursion?color=78C372&label=minzip
[bundle-href]: https://bundlejs.com/?q=regex-recursion&treeshake=[*]

View File

@@ -1,17 +1,2 @@
var Regex;(Regex||={}).plugins=(()=>{var N=Object.defineProperty;var P=Object.getOwnPropertyDescriptor;var W=Object.getOwnPropertyNames;var q=Object.prototype.hasOwnProperty;var z=(e,t)=>{for(var n in t)N(e,n,{get:t[n],enumerable:!0})},H=(e,t,n,a)=>{if(t&&typeof t=="object"||typeof t=="function")for(let r of W(t))!q.call(e,r)&&r!==n&&N(e,r,{get:()=>t[r],enumerable:!(a=P(t,r))||a.enumerable});return e};var Q=e=>H(N({},"__esModule",{value:!0}),e);var X={};z(X,{recursion:()=>K});var i=Object.freeze({DEFAULT:"DEFAULT",CHAR_CLASS:"CHAR_CLASS"});function I(e,t,n,a){let r=new RegExp(String.raw`${t}|(?<$skip>\[\^?|\\?.)`,"gsu"),s=[!1],u=0,o="";for(let c of e.matchAll(r)){let{0:f,groups:{$skip:p}}=c;if(!p&&(!a||a===i.DEFAULT==!u)){n instanceof Function?o+=n(c,{context:u?i.CHAR_CLASS:i.DEFAULT,negated:s[s.length-1]}):o+=n;continue}f[0]==="["?(u++,s.push(f[1]==="^")):f==="]"&&u&&(u--,s.pop()),o+=f}return o}function b(e,t,n,a){I(e,t,n,a)}function Z(e,t,n=0,a){if(!new RegExp(t,"su").test(e))return null;let r=new RegExp(`${t}|(?<$skip>\\\\?.)`,"gsu");r.lastIndex=n;let s=0,u;for(;u=r.exec(e);){let{0:o,groups:{$skip:c}}=u;if(!c&&(!a||a===i.DEFAULT==!s))return u;o==="["?s++:o==="]"&&s&&s--,r.lastIndex==u.index&&r.lastIndex++}return null}function k(e,t,n){return!!Z(e,t,0,n)}function T(e,t){let n=/\\?./gsu;n.lastIndex=t;let a=e.length,r=0,s=1,u;for(;u=n.exec(e);){let[o]=u;if(o==="[")r++;else if(r)o==="]"&&r--;else if(o==="(")s++;else if(o===")"&&(s--,!s)){a=u.index;break}}return e.slice(t,a)}var w="$E$";var L=String.raw`\(\?(?:[:=!>A-Za-z\-]|<[=!]|\(DEFINE\))`;var ce=new RegExp(String.raw`(?<noncapturingStart>${L})|(?<capturingStart>\((?:\?<[^>]+>)?)|\\?.`,"gsu");var j=String.raw`(?:[?*+]|\{\d+(?:,\d*)?\})`,ie=new RegExp(String.raw`
\\(?: \d+
| c[A-Za-z]
| [gk]<[^>]+>
| [pPu]\{[^\}]+\}
| u[A-Fa-f\d]{4}
| x[A-Fa-f\d]{2}
)
| \((?: \? (?: [:=!>]
| <(?:[=!]|[^>]+>)
| [A-Za-z\-]+:
| \(DEFINE\)
))?
| (?<qBase>${j})(?<qMod>[?+]?)(?<invalidQ>[?*+\{]?)
| \\?.
`.replace(/\s+/g,""),"gsu");var l=String.raw,J=l`\\g<(?<gRNameOrNum>[^>&]+)&R=(?<gRDepth>[^>]+)>`,U=l`\(\?R=(?<rDepth>[^\)]+)\)|${J}`,R=l`\(\?<(?![=!])(?<captureName>[^>]+)>`,g=new RegExp(l`${R}|${U}|\(\?|\\?.`,"gsu"),G="Cannot use multiple overlapping recursions",M=new RegExp(l`(?:\$[1-9]\d*)?${w.replace(/\$/g,l`\$`)}`,"y");function K(e,t){if(!new RegExp(U,"su").test(e))return e;if(k(e,l`\(\?\(DEFINE\)`,i.DEFAULT))throw new Error("DEFINE groups cannot be used with recursion");let n=!!t?.useEmulationGroups,a=k(e,l`\\[1-9]`,i.DEFAULT),r=new Map,s=[],u=!1,o=0,c=0,f;for(g.lastIndex=0;f=g.exec(e);){let{0:p,groups:{captureName:A,rDepth:$,gRNameOrNum:d,gRDepth:E}}=f;if(p==="[")o++;else if(o)p==="]"&&o--;else if($){if(_($),u)throw new Error(G);if(a)throw new Error("Numbered backrefs cannot be used with global recursion");let h=e.slice(0,f.index),m=e.slice(g.lastIndex);if(k(m,U,i.DEFAULT))throw new Error(G);return v(h,m,+$,!1,n)}else if(d){_(E);let h=!1;for(let C of s)if(C.name===d||C.num===+d){if(h=!0,C.hasRecursedWithin)throw new Error(G);break}if(!h)throw new Error(l`Recursive \g cannot be used outside the referenced group "\g<${d}&R=${E}>"`);let m=r.get(d),x=T(e,m);if(a&&k(x,l`${R}|\((?!\?)`,i.DEFAULT))throw new Error("Numbered backrefs cannot be used with recursion of capturing groups");let D=e.slice(m,f.index),S=x.slice(D.length+p.length),F=v(D,S,+E,!0,n),O=e.slice(0,m),y=e.slice(m+x.length);e=`${O}${F}${y}`,g.lastIndex+=F.length-p.length-D.length-S.length,s.forEach(C=>C.hasRecursedWithin=!0),u=!0}else if(A)c++,r.set(String(c),g.lastIndex),r.set(A,g.lastIndex),s.push({num:c,name:A});else if(p.startsWith("(")){let h=p==="(";h&&(c++,r.set(String(c),g.lastIndex+(n?V(e,g.lastIndex):0))),s.push(h?{num:c}:{})}else p===")"&&s.pop()}return e}function _(e){let t=`Max depth must be integer between 2 and 100; used ${e}`;if(!/^[1-9]\d*$/.test(e))throw new Error(t);if(e=+e,e<2||e>100)throw new Error(t)}function v(e,t,n,a,r){let s=new Set;a&&b(e+t,R,({groups:{captureName:o}})=>{s.add(o)},i.DEFAULT);let u=n-1;return`${e}${B(`(?:${e}`,u,a?s:null,"forward",r)}(?:)${B(`${t})`,u,a?s:null,"backward",r)}${t}`}function B(e,t,n,a,r){let u=c=>a==="backward"?t-c+2-1:c+2,o="";for(let c=0;c<t;c++){let f=u(c);o+=I(e,l`${R}|\\k<(?<backref>[^>]+)>${r?l`|(?<unnamed>\()(?!\?)(?:${M.source})?`:""}`,({0:p,index:A,groups:{captureName:$,backref:d,unnamed:E}})=>{if(d&&n&&!n.has(d))return p;if(E)return`(${w}`;let h=`_$${f}`;return $?`(?<${$}${h}>${r?w:""}`:l`\k<${d}${h}>`},i.DEFAULT)}return o}function V(e,t){M.lastIndex=t;let n=M.exec(e);return n?n[0].length:0}return Q(X);})();
var Regex;(Regex||={}).plugins=(()=>{var N=Object.defineProperty;var q=Object.getOwnPropertyDescriptor;var y=Object.getOwnPropertyNames;var J=Object.prototype.hasOwnProperty;var K=(e,t)=>{for(var n in t)N(e,n,{get:t[n],enumerable:!0})},Q=(e,t,n,r)=>{if(t&&typeof t=="object"||typeof t=="function")for(let o of y(t))!J.call(e,o)&&o!==n&&N(e,o,{get:()=>t[o],enumerable:!(r=q(t,o))||r.enumerable});return e};var V=e=>Q(N({},"__esModule",{value:!0}),e);var ne={};K(ne,{recursion:()=>Z});var m=Object.freeze({DEFAULT:"DEFAULT",CHAR_CLASS:"CHAR_CLASS"});function T(e,t,n,r){let o=new RegExp(String.raw`${t}|(?<$skip>\[\^?|\\?.)`,"gsu"),u=[!1],s=0,c="";for(let i of e.matchAll(o)){let{0:p,groups:{$skip:f}}=i;if(!f&&(!r||r===m.DEFAULT==!s)){n instanceof Function?c+=n(i,{context:s?m.CHAR_CLASS:m.DEFAULT,negated:u[u.length-1]}):c+=n;continue}p[0]==="["?(s++,u.push(p[1]==="^")):p==="]"&&s&&(s--,u.pop()),c+=p}return c}function F(e,t,n,r){T(e,t,n,r)}function X(e,t,n=0,r){if(!new RegExp(t,"su").test(e))return null;let o=new RegExp(`${t}|(?<$skip>\\\\?.)`,"gsu");o.lastIndex=n;let u=0,s;for(;s=o.exec(e);){let{0:c,groups:{$skip:i}}=s;if(!i&&(!r||r===m.DEFAULT==!u))return s;c==="["?u++:c==="]"&&u&&u--,o.lastIndex==s.index&&o.lastIndex++}return null}function k(e,t,n){return!!X(e,t,0,n)}function G(e,t){let n=/\\?./gsu;n.lastIndex=t;let r=e.length,o=0,u=1,s;for(;s=n.exec(e);){let[c]=s;if(c==="[")o++;else if(o)c==="]"&&o--;else if(c==="(")u++;else if(c===")"&&(u--,!u)){r=s.index;break}}return e.slice(t,r)}var w=String.raw,Y=w`\\g<(?<gRNameOrNum>[^>&]+)&R=(?<gRDepth>[^>]+)>`,I=w`\(\?R=(?<rDepth>[^\)]+)\)|${Y}`,L=w`\(\?<(?![=!])(?<captureName>[^>]+)>`,_=w`${L}|(?<unnamed>\()(?!\?)`,x=new RegExp(w`${L}|${I}|\(\?|\\?.`,"gsu"),b="Cannot use multiple overlapping recursions";function Z(e,t){let{hiddenCaptures:n,mode:r}={hiddenCaptures:[],mode:"plugin",...t},o=t?.captureTransfers??new Map;if(!new RegExp(I,"su").test(e))return{pattern:e,captureTransfers:o,hiddenCaptures:n};if(r==="plugin"&&k(e,w`\(\?\(DEFINE\)`,m.DEFAULT))throw new Error("DEFINE groups cannot be used with recursion");let u=[],s=k(e,w`\\[1-9]`,m.DEFAULT),c=new Map,i=[],p=!1,f=0,a=0,$;for(x.lastIndex=0;$=x.exec(e);){let{0:g,groups:{captureName:d,rDepth:h,gRNameOrNum:l,gRDepth:R}}=$;if(g==="[")f++;else if(f)g==="]"&&f--;else if(h){if(B(h),p)throw new Error(b);if(s)throw new Error(`${r==="external"?"Backrefs":"Numbered backrefs"} cannot be used with global recursion`);let C=e.slice(0,$.index),E=e.slice(x.lastIndex);if(k(E,I,m.DEFAULT))throw new Error(b);let D=+h-1;e=H(C,E,D,!1,n,u,a),o=W(o,C,D,u.length,0,a);break}else if(l){B(R);let C=!1;for(let U of i)if(U.name===l||U.num===+l){if(C=!0,U.hasRecursedWithin)throw new Error(b);break}if(!C)throw new Error(w`Recursive \g cannot be used outside the referenced group "${r==="external"?l:w`\g<${l}&R=${R}>`}"`);let E=c.get(l),D=G(e,E);if(s&&k(D,w`${L}|\((?!\?)`,m.DEFAULT))throw new Error(`${r==="external"?"Backrefs":"Numbered backrefs"} cannot be used with recursion of capturing groups`);let A=e.slice(E,$.index),S=D.slice(A.length+g.length),O=u.length,M=+R-1,v=H(A,S,M,!0,n,u,a);o=W(o,A,M,u.length-O,O,a);let z=e.slice(0,E),j=e.slice(E+D.length);e=`${z}${v}${j}`,x.lastIndex+=v.length-g.length-A.length-S.length,i.forEach(U=>U.hasRecursedWithin=!0),p=!0}else if(d)a++,c.set(String(a),x.lastIndex),c.set(d,x.lastIndex),i.push({num:a,name:d});else if(g[0]==="("){let C=g==="(";C&&(a++,c.set(String(a),x.lastIndex)),i.push(C?{num:a}:{})}else g===")"&&i.pop()}return n.push(...u),{pattern:e,captureTransfers:o,hiddenCaptures:n}}function B(e){let t=`Max depth must be integer between 2 and 100; used ${e}`;if(!/^[1-9]\d*$/.test(e))throw new Error(t);if(e=+e,e<2||e>100)throw new Error(t)}function H(e,t,n,r,o,u,s){let c=new Set;r&&F(e+t,L,({groups:{captureName:p}})=>{c.add(p)},m.DEFAULT);let i=[n,r?c:null,o,u,s];return`${e}${P(`(?:${e}`,"forward",...i)}(?:)${P(`${t})`,"backward",...i)}${t}`}function P(e,t,n,r,o,u,s){let i=f=>t==="forward"?f+2:n-f+2-1,p="";for(let f=0;f<n;f++){let a=i(f);p+=T(e,w`${_}|\\k<(?<backref>[^>]+)>`,({0:$,groups:{captureName:g,unnamed:d,backref:h}})=>{if(h&&r&&!r.has(h))return $;let l=`_$${a}`;if(d||g){let R=s+u.length+1;return u.push(R),ee(o,R),d?$:`(?<${g}${l}>`}return w`\k<${h}${l}>`},m.DEFAULT)}return p}function ee(e,t){for(let n=0;n<e.length;n++)e[n]>=t&&e[n]++}function W(e,t,n,r,o,u){if(e.size&&r){let s=0;F(t,_,()=>s++,m.DEFAULT);let c=u-s+o,i=new Map;return e.forEach((p,f)=>{let a=(r-s*n)/n,$=s*n,g=f>c+s?f+r:f,d=[];for(let h of p)if(h<=c)d.push(h);else if(h>c+s+a)d.push(h+r);else if(h<=c+s)for(let l=0;l<=n;l++)d.push(h+s*l);else for(let l=0;l<=n;l++)d.push(h+$+a*l);i.set(g,d)}),i}return e}return V(ne);})();
//# sourceMappingURL=regex-recursion.min.js.map

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "regex-recursion",
"version": "5.1.1",
"version": "6.0.2",
"description": "Recursive matching plugin for Regex+",
"author": "Steven Levithan",
"license": "MIT",
@@ -13,15 +13,6 @@
},
"browser": "./dist/regex-recursion.min.js",
"types": "./types/index.d.ts",
"scripts": {
"bundle:global": "esbuild src/index.js --global-name=Regex.plugins --bundle --minify --sourcemap --outfile=dist/regex-recursion.min.js",
"types": "tsc src/index.js --rootDir src --declaration --allowJs --emitDeclarationOnly --outDir types",
"prebuild": "rm -rf dist/* types/*",
"build": "pnpm run bundle:global && pnpm run types",
"pretest": "pnpm run build",
"test": "jasmine",
"prepare": "pnpm test"
},
"files": [
"dist",
"src",
@@ -37,12 +28,20 @@
"regexp"
],
"dependencies": {
"regex": "^5.1.1",
"regex-utilities": "^2.3.0"
},
"devDependencies": {
"esbuild": "^0.24.2",
"jasmine": "^5.5.0",
"typescript": "~5.7.2"
"regex": "^6.0.1",
"typescript": "^5.7.3"
},
"scripts": {
"bundle:global": "esbuild src/index.js --global-name=Regex.plugins --bundle --minify --sourcemap --outfile=dist/regex-recursion.min.js",
"types": "tsc src/index.js --rootDir src --declaration --allowJs --emitDeclarationOnly --outDir types",
"prebuild": "rm -rf dist/* types/*",
"build": "pnpm run bundle:global && pnpm run types",
"pretest": "pnpm run build",
"test": "jasmine"
}
}
}

View File

@@ -1,42 +1,58 @@
import {Context, forEachUnescaped, getGroupContents, hasUnescaped, replaceUnescaped} from 'regex-utilities';
import {emulationGroupMarker} from 'regex/internals';
const r = String.raw;
const gRToken = r`\\g<(?<gRNameOrNum>[^>&]+)&R=(?<gRDepth>[^>]+)>`;
const recursiveToken = r`\(\?R=(?<rDepth>[^\)]+)\)|${gRToken}`;
const namedCapturingDelim = r`\(\?<(?![=!])(?<captureName>[^>]+)>`;
const token = new RegExp(r`${namedCapturingDelim}|${recursiveToken}|\(\?|\\?.`, 'gsu');
const namedCaptureDelim = r`\(\?<(?![=!])(?<captureName>[^>]+)>`;
const captureDelim = r`${namedCaptureDelim}|(?<unnamed>\()(?!\?)`;
const token = new RegExp(r`${namedCaptureDelim}|${recursiveToken}|\(\?|\\?.`, 'gsu');
const overlappingRecursionMsg = 'Cannot use multiple overlapping recursions';
// Support emulation groups with transfer marker prefix
const emulationGroupMarkerRe = new RegExp(r`(?:\$[1-9]\d*)?${emulationGroupMarker.replace(/\$/g, r`\$`)}`, 'y');
/**
@param {string} expression
@param {string} pattern
@param {{
flags?: string;
useEmulationGroups?: boolean;
captureTransfers?: Map<number, Array<number>>;
hiddenCaptures?: Array<number>;
mode?: 'plugin' | 'external';
}} [data]
@returns {string}
@returns {{
pattern: string;
captureTransfers: Map<number, Array<number>>;
hiddenCaptures: Array<number>;
}}
*/
export function recursion(expression, data) {
function recursion(pattern, data) {
const {hiddenCaptures, mode} = {
hiddenCaptures: [],
mode: 'plugin',
...data,
};
// Capture transfer is used by <github.com/slevithan/oniguruma-to-es>
let captureTransfers = data?.captureTransfers ?? new Map();
// Keep the initial fail-check (which avoids unneeded processing) as fast as possible by testing
// without the accuracy improvement of using `hasUnescaped` with default `Context`
if (!(new RegExp(recursiveToken, 'su').test(expression))) {
return expression;
// without the accuracy improvement of using `hasUnescaped` with `Context.DEFAULT`
if (!(new RegExp(recursiveToken, 'su').test(pattern))) {
return {
pattern,
captureTransfers,
hiddenCaptures,
};
}
if (hasUnescaped(expression, r`\(\?\(DEFINE\)`, Context.DEFAULT)) {
if (mode === 'plugin' && hasUnescaped(pattern, r`\(\?\(DEFINE\)`, Context.DEFAULT)) {
throw new Error('DEFINE groups cannot be used with recursion');
}
const useEmulationGroups = !!data?.useEmulationGroups;
const hasNumberedBackref = hasUnescaped(expression, r`\\[1-9]`, Context.DEFAULT);
const addedHiddenCaptures = [];
const hasNumberedBackref = hasUnescaped(pattern, r`\\[1-9]`, Context.DEFAULT);
const groupContentsStartPos = new Map();
const openGroups = [];
let hasRecursed = false;
let numCharClassesOpen = 0;
let numCaptures = 0;
let numCapturesPassed = 0;
let match;
token.lastIndex = 0;
while ((match = token.exec(expression))) {
while ((match = token.exec(pattern))) {
const {0: m, groups: {captureName, rDepth, gRNameOrNum, gRDepth}} = match;
if (m === '[') {
numCharClassesOpen++;
@@ -57,15 +73,37 @@ export function recursion(expression, data) {
// Note that Regex+'s extended syntax (atomic groups and sometimes subroutines) can also
// add numbered backrefs, but those work fine because external plugins like this one run
// *before* the transformation of built-in syntax extensions
throw new Error('Numbered backrefs cannot be used with global recursion');
throw new Error(
// When used in `external` mode by transpilers other than Regex+, backrefs might have
// gone through conversion from named to numbered, so avoid a misleading error
`${mode === 'external' ? 'Backrefs' : 'Numbered backrefs'} cannot be used with global recursion`
);
}
const pre = expression.slice(0, match.index);
const post = expression.slice(token.lastIndex);
if (hasUnescaped(post, recursiveToken, Context.DEFAULT)) {
const left = pattern.slice(0, match.index);
const right = pattern.slice(token.lastIndex);
if (hasUnescaped(right, recursiveToken, Context.DEFAULT)) {
throw new Error(overlappingRecursionMsg);
}
const reps = +rDepth - 1;
pattern = makeRecursive(
left,
right,
reps,
false,
hiddenCaptures,
addedHiddenCaptures,
numCapturesPassed
);
captureTransfers = mapCaptureTransfers(
captureTransfers,
left,
reps,
addedHiddenCaptures.length,
0,
numCapturesPassed
);
// No need to parse further
return makeRecursive(pre, post, +rDepth, false, useEmulationGroups);
break;
// `\g<name&R=N>`, `\g<number&R=N>`
} else if (gRNameOrNum) {
assertMaxInBounds(gRDepth);
@@ -80,46 +118,66 @@ export function recursion(expression, data) {
}
}
if (!isWithinReffedGroup) {
throw new Error(r`Recursive \g cannot be used outside the referenced group "\g<${gRNameOrNum}&R=${gRDepth}>"`);
throw new Error(r`Recursive \g cannot be used outside the referenced group "${
mode === 'external' ? gRNameOrNum : r`\g<${gRNameOrNum}&R=${gRDepth}>`
}"`);
}
const startPos = groupContentsStartPos.get(gRNameOrNum);
const groupContents = getGroupContents(expression, startPos);
const groupContents = getGroupContents(pattern, startPos);
if (
hasNumberedBackref &&
hasUnescaped(groupContents, r`${namedCapturingDelim}|\((?!\?)`, Context.DEFAULT)
hasUnescaped(groupContents, r`${namedCaptureDelim}|\((?!\?)`, Context.DEFAULT)
) {
throw new Error('Numbered backrefs cannot be used with recursion of capturing groups');
throw new Error(
// When used in `external` mode by transpilers other than Regex+, backrefs might have
// gone through conversion from named to numbered, so avoid a misleading error
`${mode === 'external' ? 'Backrefs' : 'Numbered backrefs'} cannot be used with recursion of capturing groups`
);
}
const groupContentsPre = expression.slice(startPos, match.index);
const groupContentsPost = groupContents.slice(groupContentsPre.length + m.length);
const expansion = makeRecursive(groupContentsPre, groupContentsPost, +gRDepth, true, useEmulationGroups);
const pre = expression.slice(0, startPos);
const post = expression.slice(startPos + groupContents.length);
const groupContentsLeft = pattern.slice(startPos, match.index);
const groupContentsRight = groupContents.slice(groupContentsLeft.length + m.length);
const numAddedHiddenCapturesPreExpansion = addedHiddenCaptures.length;
const reps = +gRDepth - 1;
const expansion = makeRecursive(
groupContentsLeft,
groupContentsRight,
reps,
true,
hiddenCaptures,
addedHiddenCaptures,
numCapturesPassed
);
captureTransfers = mapCaptureTransfers(
captureTransfers,
groupContentsLeft,
reps,
addedHiddenCaptures.length - numAddedHiddenCapturesPreExpansion,
numAddedHiddenCapturesPreExpansion,
numCapturesPassed
);
const pre = pattern.slice(0, startPos);
const post = pattern.slice(startPos + groupContents.length);
// Modify the string we're looping over
expression = `${pre}${expansion}${post}`;
pattern = `${pre}${expansion}${post}`;
// Step forward for the next loop iteration
token.lastIndex += expansion.length - m.length - groupContentsPre.length - groupContentsPost.length;
token.lastIndex += expansion.length - m.length - groupContentsLeft.length - groupContentsRight.length;
openGroups.forEach(g => g.hasRecursedWithin = true);
hasRecursed = true;
} else if (captureName) {
numCaptures++;
// NOTE: Not currently handling *named* emulation groups that already exist in the pattern
groupContentsStartPos.set(String(numCaptures), token.lastIndex);
numCapturesPassed++;
groupContentsStartPos.set(String(numCapturesPassed), token.lastIndex);
groupContentsStartPos.set(captureName, token.lastIndex);
openGroups.push({
num: numCaptures,
num: numCapturesPassed,
name: captureName,
});
} else if (m.startsWith('(')) {
} else if (m[0] === '(') {
const isUnnamedCapture = m === '(';
if (isUnnamedCapture) {
numCaptures++;
groupContentsStartPos.set(
String(numCaptures),
token.lastIndex + (useEmulationGroups ? emulationGroupMarkerLength(expression, token.lastIndex) : 0)
);
numCapturesPassed++;
groupContentsStartPos.set(String(numCapturesPassed), token.lastIndex);
}
openGroups.push(isUnnamedCapture ? {num: numCaptures} : {});
openGroups.push(isUnnamedCapture ? {num: numCapturesPassed} : {});
} else if (m === ')') {
openGroups.pop();
}
@@ -129,7 +187,13 @@ export function recursion(expression, data) {
}
}
return expression;
hiddenCaptures.push(...addedHiddenCaptures);
return {
pattern,
captureTransfers,
hiddenCaptures,
};
}
/**
@@ -147,66 +211,88 @@ function assertMaxInBounds(max) {
}
/**
@param {string} pre
@param {string} post
@param {number} maxDepth
@param {string} left
@param {string} right
@param {number} reps
@param {boolean} isSubpattern
@param {boolean} useEmulationGroups
@param {Array<number>} hiddenCaptures
@param {Array<number>} addedHiddenCaptures
@param {number} numCapturesPassed
@returns {string}
*/
function makeRecursive(pre, post, maxDepth, isSubpattern, useEmulationGroups) {
function makeRecursive(
left,
right,
reps,
isSubpattern,
hiddenCaptures,
addedHiddenCaptures,
numCapturesPassed
) {
const namesInRecursed = new Set();
// Avoid this work if not needed
// Can skip this work if not needed
if (isSubpattern) {
forEachUnescaped(pre + post, namedCapturingDelim, ({groups: {captureName}}) => {
forEachUnescaped(left + right, namedCaptureDelim, ({groups: {captureName}}) => {
namesInRecursed.add(captureName);
}, Context.DEFAULT);
}
const reps = maxDepth - 1;
// Depth 2: 'pre(?:pre(?:)post)post'
// Depth 3: 'pre(?:pre(?:pre(?:)post)post)post'
return `${pre}${
repeatWithDepth(`(?:${pre}`, reps, (isSubpattern ? namesInRecursed : null), 'forward', useEmulationGroups)
const rest = [
reps,
isSubpattern ? namesInRecursed : null,
hiddenCaptures,
addedHiddenCaptures,
numCapturesPassed,
];
// Depth 2: 'left(?:left(?:)right)right'
// Depth 3: 'left(?:left(?:left(?:)right)right)right'
// Empty group in the middle separates tokens and absorbs a following quantifier if present
return `${left}${
repeatWithDepth(`(?:${left}`, 'forward', ...rest)
}(?:)${
repeatWithDepth(`${post})`, reps, (isSubpattern ? namesInRecursed : null), 'backward', useEmulationGroups)
}${post}`;
repeatWithDepth(`${right})`, 'backward', ...rest)
}${right}`;
}
/**
@param {string} expression
@param {string} pattern
@param {'forward' | 'backward'} direction
@param {number} reps
@param {Set<string> | null} namesInRecursed
@param {'forward' | 'backward'} direction
@param {boolean} useEmulationGroups
@param {Array<number>} hiddenCaptures
@param {Array<number>} addedHiddenCaptures
@param {number} numCapturesPassed
@returns {string}
*/
function repeatWithDepth(expression, reps, namesInRecursed, direction, useEmulationGroups) {
function repeatWithDepth(
pattern,
direction,
reps,
namesInRecursed,
hiddenCaptures,
addedHiddenCaptures,
numCapturesPassed
) {
const startNum = 2;
const depthNum = i => direction === 'backward' ? reps - i + startNum - 1 : i + startNum;
const getDepthNum = i => direction === 'forward' ? (i + startNum) : (reps - i + startNum - 1);
let result = '';
for (let i = 0; i < reps; i++) {
const captureNum = depthNum(i);
const depthNum = getDepthNum(i);
result += replaceUnescaped(
expression,
// NOTE: Not currently handling *named* emulation groups that already exist in the pattern
r`${namedCapturingDelim}|\\k<(?<backref>[^>]+)>${
useEmulationGroups ? r`|(?<unnamed>\()(?!\?)(?:${emulationGroupMarkerRe.source})?` : ''
}`,
({0: m, index, groups: {captureName, backref, unnamed}}) => {
pattern,
r`${captureDelim}|\\k<(?<backref>[^>]+)>`,
({0: m, groups: {captureName, unnamed, backref}}) => {
if (backref && namesInRecursed && !namesInRecursed.has(backref)) {
// Don't alter backrefs to groups outside the recursed subpattern
return m;
}
// Only matches unnamed capture delim if `useEmulationGroups`
if (unnamed) {
// Add an emulation group marker, possibly replacing an existing marker (removes any
// transfer prefix)
return `(${emulationGroupMarker}`;
const suffix = `_$${depthNum}`;
if (unnamed || captureName) {
const addedCaptureNum = numCapturesPassed + addedHiddenCaptures.length + 1;
addedHiddenCaptures.push(addedCaptureNum);
incrementIfAtLeast(hiddenCaptures, addedCaptureNum);
return unnamed ? m : `(?<${captureName}${suffix}>`;
}
const suffix = `_$${captureNum}`;
return captureName ?
`(?<${captureName}${suffix}>${useEmulationGroups ? emulationGroupMarker : ''}` :
r`\k<${backref}${suffix}>`;
return r`\k<${backref}${suffix}>`;
},
Context.DEFAULT
);
@@ -214,8 +300,66 @@ function repeatWithDepth(expression, reps, namesInRecursed, direction, useEmulat
return result;
}
function emulationGroupMarkerLength(expression, index) {
emulationGroupMarkerRe.lastIndex = index;
const match = emulationGroupMarkerRe.exec(expression);
return match ? match[0].length : 0;
/**
Updates the array in place by incrementing each value greater than or equal to the threshold.
@param {Array<number>} arr
@param {number} threshold
*/
function incrementIfAtLeast(arr, threshold) {
for (let i = 0; i < arr.length; i++) {
if (arr[i] >= threshold) {
arr[i]++;
}
}
}
/**
@param {Map<number, Array<number>>} captureTransfers
@param {string} left
@param {number} reps
@param {number} numCapturesAddedInExpansion
@param {number} numAddedHiddenCapturesPreExpansion
@param {number} numCapturesPassed
@returns {Map<number, Array<number>>}
*/
function mapCaptureTransfers(captureTransfers, left, reps, numCapturesAddedInExpansion, numAddedHiddenCapturesPreExpansion, numCapturesPassed) {
if (captureTransfers.size && numCapturesAddedInExpansion) {
let numCapturesInLeft = 0;
forEachUnescaped(left, captureDelim, () => numCapturesInLeft++, Context.DEFAULT);
// Is 0 for global recursion
const recursionDelimCaptureNum = numCapturesPassed - numCapturesInLeft + numAddedHiddenCapturesPreExpansion;
const newCaptureTransfers = new Map();
captureTransfers.forEach((from, to) => {
const numCapturesInRight = (numCapturesAddedInExpansion - (numCapturesInLeft * reps)) / reps;
const numCapturesAddedInLeft = numCapturesInLeft * reps;
const newTo = to > (recursionDelimCaptureNum + numCapturesInLeft) ? to + numCapturesAddedInExpansion : to;
const newFrom = [];
for (const f of from) {
// Before the recursed subpattern
if (f <= recursionDelimCaptureNum) {
newFrom.push(f);
// After the recursed subpattern
} else if (f > (recursionDelimCaptureNum + numCapturesInLeft + numCapturesInRight)) {
newFrom.push(f + numCapturesAddedInExpansion);
// Within the recursed subpattern, on the left of the recursion token
} else if (f <= (recursionDelimCaptureNum + numCapturesInLeft)) {
for (let i = 0; i <= reps; i++) {
newFrom.push(f + (numCapturesInLeft * i));
}
// Within the recursed subpattern, on the right of the recursion token
} else {
for (let i = 0; i <= reps; i++) {
newFrom.push(f + numCapturesAddedInLeft + (numCapturesInRight * i));
}
}
}
newCaptureTransfers.set(newTo, newFrom);
});
return newCaptureTransfers;
}
return captureTransfers;
}
export {
recursion,
};

View File

@@ -1,12 +1,24 @@
/**
@param {string} expression
@param {string} pattern
@param {{
flags?: string;
useEmulationGroups?: boolean;
captureTransfers?: Map<number, Array<number>>;
hiddenCaptures?: Array<number>;
mode?: 'plugin' | 'external';
}} [data]
@returns {string}
@returns {{
pattern: string;
captureTransfers: Map<number, Array<number>>;
hiddenCaptures: Array<number>;
}}
*/
export function recursion(expression: string, data?: {
export function recursion(pattern: string, data?: {
flags?: string;
useEmulationGroups?: boolean;
}): string;
captureTransfers?: Map<number, Array<number>>;
hiddenCaptures?: Array<number>;
mode?: "plugin" | "external";
}): {
pattern: string;
captureTransfers: Map<number, Array<number>>;
hiddenCaptures: Array<number>;
};