<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Artful Code &#187; regexp</title>
	<atom:link href="http://www.artfulcode.net/tags/regexp/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.artfulcode.net</link>
	<description>Resources and tips for dynamic, interactive languages.</description>
	<lastBuildDate>Fri, 09 Sep 2011 02:15:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Using newLISP&#8217;s find-all</title>
		<link>http://www.artfulcode.net/articles/using-newlisps-find-all/</link>
		<comments>http://www.artfulcode.net/articles/using-newlisps-find-all/#comments</comments>
		<pubDate>Tue, 12 Aug 2008 15:29:47 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[newlisp]]></category>
		<category><![CDATA[regexp]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.artfulcode.net/articles/using-newlisps-find-all/</guid>
		<description><![CDATA[newLISP&#8217;s find-all utility is exceptionally powerful, especially when coupled with rich matching functions like match, regex, and unify. find-all combines search and substitution into a fast, comprehensive function for extracting data from lists and strings. Basic syntax Like many newLISP sequence functions, find-all is defined for both strings and lists. The basic syntax is: Strings: [...]]]></description>
			<content:encoded><![CDATA[<p>newLISP&#8217;s <code>find-all</code> utility is exceptionally powerful, especially when coupled with rich matching functions like <code>match</code>, <code>regex</code>, and <code>unify</code>. <code>find-all</code> combines search and substitution into a fast, comprehensive function for extracting data from lists and strings.<span id="more-15"></span></p>
<h4>Basic syntax</h4>
<p>Like many newLISP sequence functions, <code>find-all</code> is defined for both strings and lists.  The basic syntax is:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;">Strings: <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> pattern target <span style="color: #AF0500;">&#91;</span>expression <span style="color: #AF0500;">&#91;</span>option<span style="color: #AF0500;">&#93;</span><span style="color: #AF0500;">&#93;</span><span style="color: #AF0500;">&#41;</span>
Lists:   <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> pattern target <span style="color: #AF0500;">&#91;</span>expression <span style="color: #AF0500;">&#91;</span>comparison-function<span style="color: #AF0500;">&#93;</span><span style="color: #AF0500;">&#93;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<h4>String searches and regular expressions</h4>
<p>The most basic string search finds occurrences of one string inside of another.</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'str <span style="color: #3AA43E;">&quot;Now is the time for all good men to come to the aid of their country.&quot;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #3AA43E;">&quot;the&quot;</span> str<span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; (&quot;the&quot; &quot;the&quot; &quot;the&quot;)</span></pre></div></div>

<p>There are three occurrences (including one within the word, &#8216;their&#8217;).  To find only occurrences of &#8216;the&#8217; as a complete word, a regular expression is used:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\bthe\b<span style="color: #AF0500;">&#125;</span></span> str <span style="color: #2028B8;">$0</span> <span style="color: #675400;">0</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; (&quot;the&quot; &quot;the&quot;)</span></pre></div></div>

<p>The first parameter is a regular expression, defined in curly braces to eliminate the need to double-escape entities (like b).  The final parameter, <code>0</code>, is the PCRE option (a full list of options is available in the newLISP documents for <a href="http://www.newlisp.org/downloads/newlisp_manual.html#regex">regex</a>.</p>
<p>The third parameter is key to one of the more powerful features of <code>find-all</code> &#8211; substitution.  The <code>expression</code> parameter is applied to each element found.  Inside of this expression, the global variable <code>$0</code> represents the entire matched element.  In a regular expression search, captured matches are available via subsequently enumerated variables: <code>$1</code>, <code>$2</code>, etc.</p>
<p>For example, to convert all occurrences of &#8216;the&#8217; to upper case in the resulting list:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\bthe\b<span style="color: #AF0500;">&#125;</span></span> str <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">upper-case</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span> <span style="color: #675400;">0</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">;(&quot;THE&quot; &quot;THE&quot;)</span></pre></div></div>

<p>Here is a short program to count the number of occurrences of some common words in the text of War and Peace:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'text <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">read-file</span> <span style="color: #3AA43E;">&quot;war_and_peace.txt&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'words <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\b\S+\b<span style="color: #AF0500;">&#125;</span></span> text <span style="color: #2028B8;">$0</span> <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span> <span style="color: #808080; font-style: italic;">; split into words</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">println</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;">&quot;%12s: %6d&quot;</span> <span style="color: #3AA43E;">&quot;Total words&quot;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">length</span> words<span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">letn</span> <span style="color: #AF0500;">&#40;</span><span style="color: #AF0500;">&#40;</span>common-words '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;and&quot;</span> <span style="color: #3AA43E;">&quot;or&quot;</span> <span style="color: #3AA43E;">&quot;the&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
       <span style="color: #AF0500;">&#40;</span>counts <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">map</span> '<span style="color: #2028B8;">list</span> common-words <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">count</span> common-words words<span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
  <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">dolist</span> <span style="color: #AF0500;">&#40;</span>word counts<span style="color: #AF0500;">&#41;</span>
    <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">println</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;">&quot;%12s: %6d&quot;</span> <span style="color: #AF0500;">&#40;</span>word <span style="color: #675400;">0</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span>word <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>With the results:</p>
<pre><code> Total words: 564345
         and:  21023
          or:   1542
         the:  31702
</code></pre>
<p>This isn&#8217;t necessarily efficient; finding all words and then counting occurrences of three specific words in a list of more than half a million elements results in a lot of wasted time.  The regular expression used also includes opening and closing punctuation in the word (i.e. a quote that begins &#8220;The&#8230;&#8221; would result in a missed occurrence of &#8216;the&#8217;, because the list would contain <em>&#8220;The</em> instead.  It also does not account for case when counting occurrences.</p>
<p>Here is a more efficient version:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'common-words '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;and&quot;</span> <span style="color: #3AA43E;">&quot;or&quot;</span> <span style="color: #3AA43E;">&quot;the&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 're <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\b<span style="color: #AF0500;">&#40;</span>%s<span style="color: #AF0500;">&#41;</span>\b<span style="color: #AF0500;">&#125;</span></span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">join</span> common-words <span style="color: #3AA43E;">&quot;|&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'occurrences <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> re text <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">lower-case</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span> <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">println</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">map</span> '<span style="color: #2028B8;">list</span> common-words <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">count</span> common-words occurrences<span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; ((&quot;and&quot; 22267) (&quot;or&quot; 1582) (&quot;the&quot; 34619))</span></pre></div></div>

<p>Note that <code>find-all</code> is recursive and newLISP has a limited stack size, which can be set during execution with the -s switch:</p>
<pre><code>newlisp -s 500000 common_counts.lsp
</code></pre>
<p>The previous example would hit the default stack limit of 2,048.  This can be solved by using map instead of <code>find-all</code>&#8216;s substitution expression:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'common-words '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;and&quot;</span> <span style="color: #3AA43E;">&quot;or&quot;</span> <span style="color: #3AA43E;">&quot;the&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 're <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\b<span style="color: #AF0500;">&#40;</span>%s<span style="color: #AF0500;">&#41;</span>\b<span style="color: #AF0500;">&#125;</span></span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">join</span> common-words <span style="color: #3AA43E;">&quot;|&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'occurrences <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">map</span> '<span style="color: #2028B8;">lower-case</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> re text <span style="color: #2028B8;">$0</span> <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">println</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">map</span> '<span style="color: #2028B8;">list</span> common-words <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">count</span> common-words occurrences<span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; ((&quot;and&quot; 22267) (&quot;or&quot; 1582) (&quot;the&quot; 34619))</span></pre></div></div>

<p>The substitution expression is also useful to cause side effects, squeezing even more efficiency out of the algorithm.  Here, <code>find-all</code> increments a count of each word and stores it in a dictionary.</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">define</span> counts:<span style="color: #2028B8;">counts</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'common-words '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;and&quot;</span> <span style="color: #3AA43E;">&quot;or&quot;</span> <span style="color: #3AA43E;">&quot;the&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 're <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;"><span style="color: #AF0500;">&#123;</span>\b<span style="color: #AF0500;">&#40;</span>%s<span style="color: #AF0500;">&#41;</span>\b<span style="color: #AF0500;">&#125;</span></span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">join</span> common-words <span style="color: #3AA43E;">&quot;|&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> re text <span style="color: #AF0500;">&#40;</span>counts <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">lower-case</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">+</span> <span style="color: #675400;">1</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">or</span> <span style="color: #AF0500;">&#40;</span>counts <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">lower-case</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span> <span style="color: #675400;">0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span> <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">dolist</span> <span style="color: #AF0500;">&#40;</span>w common-words<span style="color: #AF0500;">&#41;</span>
  <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">println</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">format</span> <span style="color: #3AA43E;">&quot;%4s: %6d&quot;</span> w <span style="color: #AF0500;">&#40;</span>counts w<span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
 <span style="color: #808080; font-style: italic;">; and:  22267</span>
 <span style="color: #808080; font-style: italic;">;  or:   1582</span>
 <span style="color: #808080; font-style: italic;">; the:  34619</span></pre></div></div>

<p>This algorithm is quite fast, although still somewhat weighty in RAM, since it stores the entire text before doing its work.</p>
<h4>List searches</h4>
<p><code>find-all</code> list searches are (arguably) nearly as powerful as regular expressions.  By default, <code>find-all</code> compares elements of target with pattern using <code>match</code>, and the default expression is the entire matched element (<code>$0</code>):</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'target '<span style="color: #AF0500;">&#40;</span><span style="color: #AF0500;">&#40;</span>a <span style="color: #675400;">1</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span>b <span style="color: #675400;">2</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span>c <span style="color: #675400;">3</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span>d <span style="color: #675400;">4</span> <span style="color: #675400;">5</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">?</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span> target<span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; ((a 1) (b 2) (c 3))</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span>d <span style="color: #2028B8;">*</span><span style="color: #AF0500;">&#41;</span> target<span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; ((d 4 5))</span></pre></div></div>

<p>Using the substitution expression, lists may be unified or matched further:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">?</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span> target <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">unify</span> '<span style="color: #AF0500;">&#40;</span>Letter Number<span style="color: #AF0500;">&#41;</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #808080; font-style: italic;">; (((Letter a) (Number 1)) ((Letter b) (Number 2)) ((Letter c) (Number 3)))</span></pre></div></div>

<p>Simple list searches using other comparators than <code>match</code> are straight-forward:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #675400;">4</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #675400;">1</span> <span style="color: #675400;">2</span> <span style="color: #675400;">3</span> <span style="color: #675400;">4</span> <span style="color: #675400;">5</span><span style="color: #AF0500;">&#41;</span> <span style="color: #2028B8;">$0</span> &amp;gt<span style="color: #808080; font-style: italic;">;)</span>
<span style="color: #808080; font-style: italic;">; (1 2 3)</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> <span style="color: #675400;">4</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #675400;">1</span> <span style="color: #675400;">2</span> <span style="color: #675400;">3</span> <span style="color: #675400;">4</span> <span style="color: #675400;">5</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">+</span> <span style="color: #675400;">1</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span> &amp;gt<span style="color: #808080; font-style: italic;">;)</span>
<span style="color: #808080; font-style: italic;">; (2 3 4)</span></pre></div></div>

<p><code>find-all</code> is convenient for searching XML.  newLISP parses XML into a tree.  For example:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">xml-type-tags</span> <span style="color: #2028B8;">nil</span> <span style="color: #2028B8;">nil</span> <span style="color: #2028B8;">nil</span> <span style="color: #2028B8;">nil</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'xml <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">xml-parse</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">get-url</span> <span style="color: #3AA43E;">&quot;http://www.weather.gov/xml/current_obs/index.xml&quot;</span><span style="color: #AF0500;">&#41;</span>
    <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">+</span> <span style="color: #675400;">1</span> <span style="color: #675400;">2</span> <span style="color: #675400;">4</span> <span style="color: #675400;">16</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>For more info on parsing XML in newLISP, see <a href="http://www.artfulcode.net/articles/working-xml-newlisp/">this article</a> and the <a href="http://www.newlisp.org/downloads/newlisp_manual.html#XML">newLISP documentation</a>.</p>
<p>The XML at the url above lists weather stations.  Here is an example station entry, under the root element, &#8220;wx_station_index&#8221;:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;station<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;station_id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>TAPA<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/station_id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;state<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>AG<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/state<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;station_name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Vc Bird Intl Airport Antigua<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/station_name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;latitude<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>17.117<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/latitude<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;longitude<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>-61.783<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/longitude<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;html_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://weather.noaa.gov/weather/current/TAPA.html<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/html_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;rss_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://weather.gov/xml/current_obs/TAPA.rss<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/rss_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;xml_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://weather.gov/xml/current_obs/TAPA.xml<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/xml_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/station<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div>

<p><code>assoc</code> is useful for finding a path in a parsed XML list:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'stations-xml <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>However, <code>assoc</code> only returns the first element of the list that matches.  In the XML article linked above, <code>pop-assoc</code> was used to collect elements iteratively:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'stations '<span style="color: #AF0500;">&#40;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">while</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span> <span style="color: #3AA43E;">&quot;station&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>That is a handy way of collecting all elements, especially in an irregular document.  With <code>find-all</code>, elements may be found just as easily and without modifying the original list:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station&quot;</span> <span style="color: #2028B8;">*</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>From there, it is just as simple to aggregate the values from each element:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'station-pattern
    '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station&quot;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station_id&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;state&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station_name&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;latitude&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;longitude&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;html_url&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;rss_url&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;xml_url&quot;</span> <span style="color: #2028B8;">?</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station&quot;</span> <span style="color: #2028B8;">*</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
    <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">match</span> station-pattern <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>This returns a list of the node values for each station.  The data can be applied to a different associative structure easily using <code>unify</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station&quot;</span> <span style="color: #2028B8;">*</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
    <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">unify</span> '<span style="color: #AF0500;">&#40;</span>Id State <span style="color: #2028B8;">Name</span> Lat Lon Html Rss Xml<span style="color: #AF0500;">&#41;</span>
        <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">match</span> station-pattern <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<p>Each element in this list this creates looks like:</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #AF0500;">&#40;</span>Id <span style="color: #3AA43E;">&quot;TAPA&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>State <span style="color: #3AA43E;">&quot;AG&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">Name</span> <span style="color: #3AA43E;">&quot;Vc Bird Intl Airport Antigua&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>Lat <span style="color: #3AA43E;">&quot;17.117&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>Lon <span style="color: #3AA43E;">&quot;-61.783&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>Html <span style="color: #3AA43E;">&quot;http://weather.noaa.gov/weather/current/TAPA.html&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>Rss <span style="color: #3AA43E;">&quot;http://weather.gov/xml/current_obs/TAPA.rss&quot;</span><span style="color: #AF0500;">&#41;</span>
 <span style="color: #AF0500;">&#40;</span>Xml <span style="color: #3AA43E;">&quot;http://weather.gov/xml/current_obs/TAPA.xml&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span></pre></div></div>

<h4>A completely impractical example</h4>
<p>There are further implications of <code>find-all</code>&#8216;s ability to cause side effects in the substitution expression.  To download the XML for each individual station, something like this could be done (<em>don&#8217;t really do this &#8211; it will attempt to spawn more than 2,000 processes</em>):</p>

<div class="wp_syntax"><div class="code"><pre class="newlisp" style="font-family:monospace;"><span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">set</span> 'processes
    <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">find-all</span> '<span style="color: #AF0500;">&#40;</span><span style="color: #3AA43E;">&quot;station&quot;</span> <span style="color: #2028B8;">*</span><span style="color: #AF0500;">&#41;</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">assoc</span> <span style="color: #AF0500;">&#40;</span>xml <span style="color: #3AA43E;">&quot;wx_station_index&quot;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
      <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">spawn</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">sym</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">lookup</span> <span style="color: #3AA43E;">&quot;station_name&quot;</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span> 'WEATHER<span style="color: #AF0500;">&#41;</span>
             <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">get-url</span> <span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">lookup</span> <span style="color: #3AA43E;">&quot;xml_url&quot;</span> <span style="color: #2028B8;">$0</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span><span style="color: #AF0500;">&#41;</span>
&nbsp;
<span style="color: #AF0500;">&#40;</span><span style="color: #2028B8;">sync</span> <span style="color: #675400;">30000</span><span style="color: #AF0500;">&#41;</span> <span style="color: #808080; font-style: italic;">; wait 30 seconds for all downloads to finish</span></pre></div></div>

<p>Assuming that the application is able to download the XML file for more than 2,000 stations in 30 seconds (and that there is no hard limit to forked processes, as there is in OSX), the context <code>WEATHER</code> will contain the XML for all stations.</p>
<p>The infeasability of this example aside, it demonstrates, as an example, how easily a user-supplied XML file could be used to script a newLISP application.</p>
<h4>Further documentation</h4>
<ul>
<li> <a href="http://www.newlisp.org/downloads/newlisp_manual.html#find-all">find-all</a></li>
<li> <a href="http://www.newlisp.org/downloads/newlisp_manual.html#unify">unify</a></li>
<li> <a href="http://www.newlisp.org/downloads/newlisp_manual.html#match">match</a></li>
<li> <a href="http://www.newlisp.org/downloads/newlisp_manual.html#regex">regex</a></li>
<li> <a href="http://www.newlisp.org/downloads/newlisp_manual.html#xml-parse">xml-parse</a></li>
</ul>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Submit article</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F&amp;title=Using+newLISP%26%238217%3Bs+find-all" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F&amp;title=Using+newLISP%26%238217%3Bs+find-all" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.dzone.com/links/add.html?description=Using+newLISP%26%238217%3Bs+find-all&amp;url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F&amp;title=Using+newLISP%26%238217%3Bs+find-all" rel="nofollow" title="Add to&nbsp;DZone"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/dzone.png" title="Add to&nbsp;DZone" alt="Add to&nbsp;DZone" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F&amp;title=Using+newLISP%26%238217%3Bs+find-all" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F&amp;title=Using+newLISP%26%238217%3Bs+find-all" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Using+newLISP%26%238217%3Bs+find-all+@+http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Fusing-newlisps-find-all%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.artfulcode.net/articles/using-newlisps-find-all/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Optimizing regular expressions</title>
		<link>http://www.artfulcode.net/articles/optimizing-regular-expressions/</link>
		<comments>http://www.artfulcode.net/articles/optimizing-regular-expressions/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 19:43:17 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[regexp]]></category>

		<guid isPermaLink="false">http://www.artfulcode.net/articles/optimizing-regular-expressions/</guid>
		<description><![CDATA[We lispers generally look down our noses at regular expressions. Regular expressions are ugly. They are not expressive. However, they are a reality of programming. When used with care, they can express complex text patterns concisely. Writing software almost always means processing some form of user input. When the format of that input is pre-determined [...]]]></description>
			<content:encoded><![CDATA[<p>We lispers generally look down our noses at regular expressions.  Regular expressions are ugly.  They are not expressive.  However, they are a reality of programming.  When used with care, they can express complex text patterns concisely.<span id="more-27"></span></p>
<p>Writing software almost always means processing some form of user input.  When the format of that input is pre-determined and (mostly) guaranteed to be valid, regular expressions are just so much overhead.  But when the validity of user input can vary widely, such as with form validation, regular expressions are much more concise than writing a custom parser.</p>
<h4>Regular expressions are slower for simple matching</h4>
<p>A regular expression must be compiled.  Incautiously crafted patterns can take up quite a bit of memory or result in poorly performing matches.  Routines that make heavy use of regular expressions can quickly become a bottlneck.</p>
<p>If a match operation doesn&#8217;t need features of regular expressions, such as alternating patterns or backreferences, it is faster to use string matching.  It takes much less time to see if a string is terminated by a semi-colon in Python using <code>string.endswith(';')</code> than <code>re.compile(r';$').search(string)</code>, especially over many iterations.</p>
<p>The exception is when a pattern is matched often over lengthy iterations.  Once compiled, regular expression matching is O(log n).  Regular string matching is typically O(n).  However, it generally takes an extraordinarily large number of iterations before regular expressions become more efficient than string matching.</p>
<h4>Optimizing for speed</h4>
<p><a href="http://www.oreilly.com/catalog/regex3/">Entire books</a> have been written on this subject.  Here are a few tips.</p>
<p>Alternating patterns (such as <code>(abc|def)</code>) can be expensive.  Always try to put the pattern most likely to match first (<code>(John|Rumplestiltskin)</code>, rather than <code>(Rumplestiltskin|John)</code>).</p>
<p>Avoid nesting repeating patterns when possible.  They grow quickly in memory and increase the number of possible matches exponentially (thereby slowing a match down noticeably).</p>
<p>As the target string gets longer, matching slows drastically when using repeating patterns.  Follow indefinite patterns with quantifiers (<code>{min, max}</code>) or with a literal or atomic group.</p>
<p>When possible, use anchors (<code>^</code> and <code>$</code>) or lookahead/lookbehind to limit the scope of a pattern and make failures occur faster.</p>
<h4>Optimizing for memory usage</h4>
<p>Some types of expressions can grow to quite large sizes in memory.  A <a href="http://regexkit.sourceforge.net/Documentation/pcre/pcreperform.html">little reading</a> shows how apparently simple expressions can grow when compiled:</p>
<pre><code>(abc|def){2,4}
is compiled as if it were
(abc|def)(abc|def)((abc|def)(abc|def)?)?
</code></pre>
<p>Indefinite and large quantifiers (<code>*</code>, <code>+</code>, and <code>{min,max}</code>) are expanded.  Imagine if the example above were <code>(abc|def){2,1000}</code>, or if the pattern matched were more complex and contained internal quantifiers, such as <code>(abc+|def){2,1000}</code>.  A little care is needed to prevent such exponential expansion.</p>
<p>One solution is to use a subroutine.  Although slower, a subroutine calls back a previous match without the memory-eating expansion:</p>
<pre><code>(abc+|def)(?1){1,999}
</code></pre>
<p>The subroutine backreferences the first grouped match and repeats it.  However, subroutine matches are treated as atomic groups.  When an indefinite match fails, the engine will typically backtrack and see if a smaller substring of the target matches.</p>
<p>For example, <code>\w+0</code> matches one or more letters followed by a zero.  If it is matched against &#8220;abcdefg0&#8243;, it matches all of the letters and the zero.  If it is matched against &#8220;abcdefg1&#8243;, it will backtrack and attempt to match against substrings of the target (&#8220;abcdefg&#8221;, &#8220;abcdef&#8221;, &#8220;abcde&#8221;, &#8230;).</p>
<p>This may or may not be the desired behavior, but it will keep the pattern&#8217;s footprint down.</p>
<h4>Links</h4>
<ul>
<li> <a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a></li>
<li> <a href="http://regexkit.sourceforge.net/Documentation/pcre/pcreperform.html">RegexKit docs</a></li>
<li> <a href="http://blog.stevenlevithan.com/archives/faster-trim-javascript">Faster JavaScript Trim</a></li>
</ul>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Submit article</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F&amp;title=Optimizing+regular+expressions" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F&amp;title=Optimizing+regular+expressions" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.dzone.com/links/add.html?description=Optimizing+regular+expressions&amp;url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F&amp;title=Optimizing+regular+expressions" rel="nofollow" title="Add to&nbsp;DZone"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/dzone.png" title="Add to&nbsp;DZone" alt="Add to&nbsp;DZone" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F&amp;title=Optimizing+regular+expressions" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F&amp;title=Optimizing+regular+expressions" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Optimizing+regular+expressions+@+http%3A%2F%2Fwww.artfulcode.net%2Farticles%2Foptimizing-regular-expressions%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://www.artfulcode.net/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.artfulcode.net/articles/optimizing-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

