<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:media="http://search.yahoo.com/mrss/" 
    xmlns:atom="http://www.w3.org/2005/Atom"
    >
<channel>
    <title>I/O Reader</title>
    <atom:link href="http://www.ioreader.com/feed" rel="self" type="application/rss+xml" />
    <link>http://www.ioreader.com/</link>
    <description>Peter Goodman's blog about computer programming.</description>
    <pubDate>Tue, 12 Feb 2013 22:44:52 GMT</pubDate>
    <language>en</language>
        <item>
                <title><![CDATA[Python GNU C99 Parser]]></title>
        <link>http://www.ioreader.com/2013/02/12/python-gnu-c99-parser</link>
        <comments>http://www.ioreader.com/2013/02/12/python-gnu-c99-parser#comments</comments>
        <pubDate>Tue, 12 Feb 2013 22:44:52 GMT</pubDate>
        <dc:creator>Peter Goodman</dc:creator>
                        <category><![CDATA[Python]]></category>
                                <category><![CDATA[C]]></category>
                                <category><![CDATA[Parsing Theory]]></category>
                                <category><![CDATA[Compilers]]></category>
                                <category><![CDATA[Granary]]></category>
                        <guid isPermaLink="false">2u</guid>
        <description><![CDATA[<p>
    As part of <a href="http://www.ioreader.com/tags/granary">Granary</a>, I have developed a sort-of GNU C99 type and function declaration parser, which is now <a href="https://github.com/pgoodman/cparser" title="GNU C99 Declaration Parser Project on GitHub">hosted on GitHub</a>. I am releasing this code because others might find it useful. There are a number of <a href="https://bitbucket.org/eliben/pycparser">other</a> <a href="https://github.com/albertz/PyCParser">implementations</a> out there; however, when I last tried them, none met my exact needs (parsing glibc headers, Darwin libc headers, and Linux kernel headers).
</p>
<p>
    This parser is not particularly novel, and likely contains bugs (I am 100% sure that the <tt>cprinter.py</tt> file has bugs). However, it has also been very useful.
</p>
<p>
    I welcome feedback, bug fixes, feature requests, or feature additions to this parser from interested third parties.
</p>]]></description>
    </item>
        <item>
                <title><![CDATA[Tracking Data with Function Pointers]]></title>
        <link>http://www.ioreader.com/2012/10/14/tracking-data-with-function-pointers</link>
        <comments>http://www.ioreader.com/2012/10/14/tracking-data-with-function-pointers#comments</comments>
        <pubDate>Sun, 14 Oct 2012 22:03:02 GMT</pubDate>
        <dc:creator>Peter Goodman</dc:creator>
                        <category><![CDATA[C]]></category>
                                <category><![CDATA[C++]]></category>
                                <category><![CDATA[Compilers]]></category>
                                <category><![CDATA[Granary]]></category>
                        <guid isPermaLink="false">2s</guid>
        <description><![CDATA[<p>
    Recently I presented <a href="http://www.petergoodman.me/docs/osdi-2012-poster.pdf" title="Granary: Granary: Comprehensive Kernel Module Instrumentation" target="_blank">a poster</a> at <a href="https://www.usenix.org/conference/osdi12/poster-sessions" title="10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012)">OSDI&#39;12</a>. The poster outlined our use of <a href="http://en.wikipedia.org/wiki/Binary_translation#Dynamic_binary_translation" title="Dynamic Binary Translation">dynamic binary translation</a> (DBT) for analysing operating system (OS) kernel modules. One novelty of our approach is that we ensure that only module code is analysed; non-module kernel code is never translated. This restriction entails taking control when module code executes (so that it can be translated) and relinquishing control when non-module kernel code executes. To regain control when kernel code invokes module code, we proactively search for and change function pointers in shared data structures.
</p>
<p>
    Proactively changing function pointers that potentially point into module code is achieved by interposing on the interface between modules and the kernel. Modules and the kernel share data structures, and those data structures can contain function pointers. Finding and changing function pointers requires recursively applying a replacement function to the fields of data structures, starting from the &quot;root&quot; function arguments. Without guards in place, this recursive process might not terminate (e.g. a cyclic data structure). In the case of deeply linked data structures (e.g. trees), this recursive process might be expensive. To avoid this expense, we apply the replacement function only to those data structures that have changed.
</p>

<p>
    Suppose we have the following code that defines a function called <tt>func_name</tt> and a function pointer field called <tt>func_ptr</tt> in a <tt>struct foo</tt>. 
</p>
<pre class="code">
void func_name(void) {
    printf(&quot;hello world!n&quot;);
}

struct foo {
    void (*func_ptr)(void);
    &hellip;
};

int main(void) {
    struct foo bar;
    bar.func_ptr = &func_name;
    func_ptr();
    return 0;
}
</pre>
<p>
    The code in the <tt>main</tt> function roughly corresponds to the following objects in memory:
</p>
<p align="center">
    <img src="http://www.ioreader.com/images/function-pointer.svg">
</p>
<p>
    On the left, we have <tt>func_ptr</tt>, which contains the address (<tt>0xBEEF</tt>) of the <tt>func_name</tt> function. When <tt>func_ptr</tt> is invoked, control transfers into the <tt>func_name</tt> function. This is signalled by the instruction pointer (<tt>%rip</tt>) changing to <tt>0xBEEF</tt>.
</p>
<p>
    Recall that the goal is to detect changes to data structures. Suppose that we have no control over the allocation (static, stack, or heap) or layout/structure/semantics of the data structures that we want to track. Given these constraints, there does not appear to be a convenient way to embed information inside of a structure.
</p>
<p>
    Two immediate approaches come to mind: i) embed some information in pointers to the data structures that we want to track, or; ii) use a map to associate addresses of data structures to their tracking meta-information.
</p>
<p>
    The first solution is undersirable for four reasons:
    <ol>
        <li>One must ensure that all instances of the original pointer are altered.</li>
        <li>One must ensure that the altered data structure pointer is correctly used.</li>
        <li>The meta-information of distinct instances of the altered pointer might get out of sync.</li>
        <li>There is a limited number of useful bits for meta-information in pointers.</li>
    </ol>
</p>
<p>
    The second solution has many desirable properties and would work. However, if one is tracking many data structures, then the cost of maintaining the map might be undesirable.
</p>
<p>
    A third solution exists if we know the types of the fields of the data structures to track. As previously stated, we have no control over the allocation or semantics of a data structure. This implies that we cannot extend the structure to contain meta-information (or a pointer thereof), and that we cannot arbitrarily change values in the structure (lest we break the program semantics). Suppose, however, that we <em>knew</em> that a structure contained a function pointer. Then changing that function is fine, so long as the control-flow behaviour when invoking that function pointer remains the same. Consider the following:
</p>
<p align="center">
    <img src="http://www.ioreader.com/images/trampoline-pointer.svg">
</p>
<p>
    Above, we introduced extra indirection in the form of a <tt>jmp</tt> instruction at address <tt>0xFEED</tt>. When <tt>func_ptr</tt> is invoked, control transfers to <tt>0xFEED</tt>, which then <tt>jmp</tt>s to <tt>0xBEEF</tt>.
</p>
<p>
    But, instructions are just another form of data. There is no reason (at least on x86) that we can&apos;t just put some meta-information beside the newly inserted <tt>jmp</tt>. For example:
</p>
<pre class="code">
struct meta {
    byte jmp_code[5] __attribute__((aligned (8)));
    &hellip;
} __attribute__((packed));

struct meta *func_name_meta = &hellip;;
</pre>
<p>
    Suppose that <tt>func_name_meta</tt> is initialized to a pointer to a <tt>struct meta</tt> object, and that object is located in executable memory. Futher, suppose that the <tt>jmp_code</tt> of <tt>func_name_meta</tt> is intialized to a 5-byte <tt>jmp</tt> instruction that transfers control to <tt>func_name</tt> (<tt>0xBEEF</tt>). Then we can swap <tt>func_ptr</tt> with <tt>func_name_meta</tt> and still expect the same control-flow behaviour. Why?
</p>
<p>
    The first five bytes of <tt>*func_name_meta</tt> are machine code, and the entire structure lives in executable memory. The next N bytes of the <tt>struct meta</tt> object contain meta-information (used for detecting changes). The address of the <tt>struct meta</tt> object (<tt>func_name_meta</tt>) is also the address of the first field (<tt>jmp_code</tt>) within the object. As a result, replacing a pointer to <tt>func_name</tt> with <tt>func_name_meta</tt> is valid insofar as we are changing one code pointer with another. When control transfers to <tt>func_name_meta</tt>, the <tt>jmp</tt> instruction in the <tt>jmp_code</tt> field transfers control to <tt>func_name</tt>.
</p>
<p>
    Usefully storing and extracting information from the meta-information is convenient:
</p>
<pre class="code">
struct meta *meta_of_bar = (struct meta *) bar.func_ptr;
</pre>
<p>
    Again, taking advantage of the layout of <tt>struct meta</tt>, we can now cast function pointers into <tt>struct meta</tt> pointers and operate on function pointers as if they were pointers to objects (because they are!).
</p>
<p>
    There is a bit more going on behind the scenes in Granary, in particular: the allocation of the executable code, what meta-information is kept, how untracked objects are detected and handled, and garbage collection of object trackers. However, I think this article has outlined the salient points of the approach, which I believe is more general than simple object tracking. Hopefully you will find this technique as fun/evil as I do!
</p>]]></description>
    </item>
        <item>
                <title><![CDATA[Traditional Parsing Methods]]></title>
        <link>http://www.ioreader.com/2012/05/09/traditional-parsing-methods</link>
        <comments>http://www.ioreader.com/2012/05/09/traditional-parsing-methods#comments</comments>
        <pubDate>Wed, 09 May 2012 16:13:04 GMT</pubDate>
        <dc:creator>Peter Goodman</dc:creator>
                        <category><![CDATA[Parsing Theory]]></category>
                                <category><![CDATA[TDOP]]></category>
                        <guid isPermaLink="false">2r</guid>
        <description><![CDATA[<p>
    One parsing technique that I sometimes use is <a href="http://dl.acm.org/citation.cfm?id=512931" title="Top Down Operator Precedence Parsing">Top Down Operator Precedence Parsing</a> (<abbr title="Top Down Operator Precedence Parsing">TDOP</abbr>). TDOP parsers have been discussed <a href="http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/">in</a> <a href="http://effbot.org/zone/tdop-index.htm">many</a> <a href="http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/">other</a> <a href="http://javascript.crockford.com/tdop/tdop.html">places</a> as well. Unfortunately, I have not seen TDOP described in terms of left-corner parsing (except for a passing comment in <a href="http://publications.csail.mit.edu/lcs/specpub.php?id=715">this thesis</a>). 
</p>
<p>
    The purpose of this post is to set the stage for a later discussion about TDOP parsing. This post will introduce top-down and bottom-up parsing, then combine the two methods to introduce left-corner parsing. Also, the top-down parsing language (TDPL) will be briefly mentioned as its semantics relate to TDOP.
</p>

<h3>Traditional Parsing Methods</h3>
<p>
    Before getting into TDOP, it&apos;s important to have at least some background in non-TDOP parsing methods. This is because TDOP can be understood as a combination of several different parsing methods.
</p>
<p>
    Parsing is a language acceptance problem. That is, a parser is a function that accepts or rejects a string. If a parser accepts a string then we say that string is in some language. The opposite is said of rejection. A string in this case means a sequence of zero or more symbols. In the English language, symbols are Latin/alphabetic characters. In the <a href="http://en.wikipedia.org/wiki/C_(programming_language)">C programming language</a>, symbols are reserved words, variables, literals, and punctuation (e.g. <tt>void</tt>, <tt>&quot;foo&quot;</tt>, <tt>&gt;</tt>, etc.).
</p>
<p>
    Typically, a parser accepts the language generated by a <a href="http://en.wikipedia.org/wiki/Context-free_grammar" title="Context-free grammar">context-free grammar</a> (<abbr title="Context-free grammar">CFG</abbr>). CFGs are a formalism for describing <a href="http://en.wikipedia.org/wiki/Context-free_language">some</a> languages. The following is an example CFG that generates simple arithmetic expressions:
</p>
<pre class="code">
E &rarr; &quot;(&quot; E &quot;)&quot;
E &rarr; A

A &rarr; M &quot;+&quot; A
A &rarr; M &quot;-&quot; A
A &rarr; &quot;-&quot; A
A &rarr; M

M &rarr; N &quot;&times;&quot; M
M &rarr; N &quot;&divide;&quot; M
M &rarr; N

N &rarr; &quot;0&quot;
N &rarr; &quot;1&quot;
  &#8942;
N &rarr; &quot;10&quot;
</pre>
<p>
    <small style="font-style:italic;">
        Note: ignore the unusual placement of the parentheses and the <a href="http://en.wikipedia.org/wiki/Associative_property">right-associativity</a> of the operators described by the grammar.
    </small>
</p>
<p>
    The name to the left of the <tt>&rarr;</tt> is called a variable or a non-terminal. Something in quotes is called a token, or terminal. Both terminals and non-terminals are considered symbols. Terminals can be thought of as the letters of one&apos;s language.
</p>
<p>
    The <tt>&rarr;</tt> itself is a relation which says that non-terminal on the left-hand side can generate the language on the right-hand side. This combination is called a production.
</p>
<p>
    <em>Note</em>: the rest of this article will focus on parsing strings from left-to-right. The following examples detailing various parsing methods assume that our parsers alway guess correctly. Finally, we assume that our grammars are &epsilon;-free. That is, the right-hand side of a production is never empty (with one exception).
</p>
<h3>Top-Down Parsing</h3>
<p>
    As its name implies, <a href="http://en.wikipedia.org/wiki/Top-down_parsing" title="Top-down parsing">top-down parsing</a> proceeds top-down. In the case of the above expression grammar, the &quot;top&quot; starts off as <tt>E</tt>. The action of going &quot;down&quot; involves one of two things:
</p>
<ol>
    <li>Replacing a non-terminal with something that it is related to (the right-hand side of <tt>&rarr;</tt>).</li>
    <li>Consuming a terminal.</li>
</ol>
<p>
    Right-hand sides of productions contain both terminals and non-terminals. In replacing a non-terminal with ones of its right-hand sides, we set up expectations about the structure of later parts of the string. For example, suppose we want to parse &quot;<tt>(2 &times; 3)</tt>&quot;. Parsing will proceed as follows:
</p>
<div align="center">
    <script type="text/javascript" src="http://www.ioreader.com/js/image-slide.js"></script>
    <table cellpadding="0" cellspacing="0" border="1">
        <thead>
            <th>Step</th>
            <th colspan="2">Action</th>
            <th>Expectations</th>
            <th>Remainder of string</th>
        </thead>
        <tbody>
        <tr>
            <td>1</td>
            <td colspan="2">start</td>
            <td><tt>E</tt></td>
            <td><tt>(2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>2</td>
            <td>replace</td>
            <td><tt>E &rarr; &quot;(&quot; E &quot;)&quot;</tt></td>
            <td><tt>&quot;(&quot; E &quot;)&quot;</tt></td>
            <td><tt>(2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>3</td>
            <td>consume</td>
            <td><tt>&quot;(&quot;</tt></td>
            <td><tt>E &quot;)&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>4</td>
            <td>replace</td>
            <td><tt>E &rarr; A</tt></td>
            <td><tt><font color="blue">A</font> &quot;)&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>5</td>
            <td>replace</td>
            <td><tt>A &rarr; M</tt></td>
            <td><tt><font color="blue">M</font> &quot;)&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>6</td>
            <td>replace</td>
            <td><tt>M &rarr; N &quot;&times;&quot; M</tt></td>
            <td><tt><font color="blue">N &quot;&times;&quot; M</font> &quot;)&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>7</td>
            <td>replace</td>
            <td><tt>N &rarr; &quot;2&quot;</tt></td>
            <td><tt><font color="blue">&quot;2&quot;</font> &quot;&times;&quot; M &quot;)&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>8</td>
            <td>consume</td>
            <td><tt>&quot;2&quot;</tt></td>
            <td><tt>&quot;&times;&quot; M &quot;)&quot;</tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        <tr>
            <td>9</td>
            <td>consume</td>
            <td><tt>&quot;&times;&quot;</tt></td>
            <td><tt>M &quot;)&quot;</tt></td>
            <td><tt>3)</tt></td>
        </tr>
        <tr>
            <td>10</td>
            <td>replace</td>
            <td><tt>M &rarr; N</tt></td>
            <td><tt><font color="blue">N</font> &quot;)&quot;</tt></td>
            <td><tt>3)</tt></td>
        </tr>
        <tr>
            <td>11</td>
            <td>replace</td>
            <td><tt>N &rarr; &quot;3&quot;</tt></td>
            <td><tt><font color="blue">&quot;3&quot;</font> &quot;)&quot;</tt></td>
            <td><tt>3)</tt></td>
        </tr>
        <tr>
            <td>12</td>
            <td>consume</td>
            <td><tt>&quot;3&quot;</tt></td>
            <td><tt>&quot;)&quot;</tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td rowspan="2">13</td>
            <td>consume</td>
            <td><tt>&quot;)&quot;</tt></td>
            <td></td>
            <td></td>
        </tr>
        <tr>
            <td colspan="2">accept</td>
            <td colspan="2"></td>
        </tr>
        </tbody>
    </table>
</div>
<p>
    If&mdash;as a side-effect of parsing a string&mdash;one wanted to build a parse tree, then the order of constructing nodes in the parse tree would be as follows:
</p>
<div align="center">
    <table border="1" cellpadding="0" cellspacing="0">
        <tr>
            <td colspan="3">
                <img src="http://www.ioreader.com/images/tdop/top-down-step0.png" id="tdop_top_down">
            </td>
        </tr>
        <tr>
            <td align="center"><button id="tdop_prev_top_down">prev</button></td>
            <td align="center"><button id="tdop_reset_top_down">reset</button></td>
            <td align="center"><button id="tdop_next_top_down">next</button></td>
        </tr>
    </table>
    <script>
    $("#tdop_top_down").imageSlider(11, "tdop_prev_top_down", "tdop_reset_top_down", "tdop_next_top_down");
    </script>
</div>
<h4>Top-Down Parsing Language</h4>
<p>
    Brief mention needs to be given to the <a href="http://en.wikipedia.org/wiki/Top-down_parsing_language" title="Top-Down Parsing Language">top-down parsing language</a> (TDPL). The TDPL formalizes the behavior of many top-down parsers. A key difference between a TDPL grammar and a CFG is that productions are totally ordered in a TDPL grammar.
</p>
<p>
    For example, if the productions of the above CFG were totally ordered according to their text order, then a parser cannot try the second production (<tt>E &rarr; A</tt>) without first failing to parse according to the first production (<tt>E &rarr; &quot;(&quot; E &quot;)&quot;</tt>).
</p>
<h3>Bottom-Up Parsing</h3>
<p>
    We can characterize top-down parsers as making &quot;global&quot; decisions. Their expectations about the future structure of the as-of-yet unseen parts of the string are evidence of this. On the other hand, <a href="http://en.wikipedia.org/wiki/Bottom-up_parsing" title="Bottom-up parsing">bottom-up parsers</a> operate &quot;locally&quot;. That is, they make decisions based only on the structure of the part of the string that they have already seen. 
</p>
<p>
    The consequence of local decision making is that bottom-up parsers discover sub-structures of the parsed string before they discover super/structures. In theory, a bottom-up parser has no expectations about the remainder of the string to be parsed. In practice, <a href="http://en.wikipedia.org/wiki/LALR_parser" title="LALR parser">common</a> bottom-up parsers implicitly make use of top-down information.
</p>
<p>
    Bottom-up parsers typically perform two main actions: shift and reduce. 
</p>
<ol>
    <li>
        Shifting is similar to consuming to the extent that our cursor into the string being parsed moves forward by one symbol. This is equivalent to removing the first symbol of the input string.
        <br><br>
        Unlike top-down parsers, bottom-up parsers do not maintain a sequence of expectations. Instead, they operate on a partially parsed substring of the input string.
        <br><br>
        Shifting involves taking the first symbol from remainder of the input string and appending it to the end of the partially parsed string.
    </li>
    <li>
        Reducing operates on a suffix of the partially parsed string. A reduction involves taking a suffix of the partially parsed string, matching it against the right-hand side of a production, and then replacing it with the left-hand side of a production (non-terminal).
    </li>
</ol>
<p>
    For example, suppose we want to parse &quot;<tt>(2 &times; 3)</tt>&quot;. Parsing will proceed as follows:
</p>
<div align="center">
    <table border="1" cellpadding="0" cellspacing="0">
        <thead>
            <th>Step</th>
            <th colspan="2">Action</th>
            <th>Partial parse</th>
            <th>Remainder of string</th>
        </thead>
        <tbody>
        <tr>
            <td>1</td>
            <td colspan="2">start</td>
            <td><tt></tt></td>
            <td><tt>(2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>2</td>
            <td>shift</td>
            <td><tt>&quot;(&quot;</tt></td>
            <td><tt>&quot;(&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        <tr>
            <td>3</td>
            <td>shift</td>
            <td><tt>&quot;2&quot;</tt></td>
            <td><tt>&quot;(&quot; &quot;2&quot;</tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        <tr>
            <td>4</td>
            <td>reduce</td>
            <td><tt>N &rarr; &quot;2&quot;</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">N</font></tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        <tr>
            <td>5</td>
            <td>shift</td>
            <td><tt>&quot;&times;&quot;</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot;</tt></td>
            <td><tt>3)</tt></td>
        </tr>
        <tr>
            <td>6</td>
            <td>shift</td>
            <td><tt>&quot;3&quot;</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; &quot;3&quot;</tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>7</td>
            <td>reduce</td>
            <td><tt>N &rarr; &quot;3&quot;</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">N</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>8</td>
            <td>reduce</td>
            <td><tt>M &rarr; N</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">M</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>9</td>
            <td>reduce</td>
            <td><tt>M &rarr; N &quot;&times;&quot; M</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">M</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>10</td>
            <td>reduce</td>
            <td><tt>A &rarr; M</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">A</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>11</td>
            <td>reduce</td>
            <td><tt>E &rarr; A</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">E</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        <tr>
            <td>12</td>
            <td>shift</td>
            <td><tt>&quot;)&quot;</tt></td>
            <td><tt>&quot;(&quot; E &quot;)&quot;</tt></td>
            <td></td>
        </tr>
        <tr>
            <td rowspan="2">13</td>
            <td>reduce</td>
            <td><tt>E &rarr; &quot;(&quot; E &quot;)&quot;</tt></td>
            <td><tt><font color="blue">E</font></tt></td>
            <td></td>
        </tr>
        <tr>
            <td colspan="2">accept</td>
            <td colspan="2"></td>
        </tr>
        </tbody>
    </table>
</div>
<p>
    If&mdash;as a side-effect of parsing a string&mdash;one wanted to build a parse tree, then the order of constructing nodes in the parse tree would be as follows:
</p>
<div align="center">
    <table border="1" cellpadding="0" cellspacing="0">
        <tr>
            <td colspan="3">
                <img src="http://www.ioreader.com/images/tdop/bottom-up-step0.png" id="tdop_bottom_up">
            </td>
        </tr>
        <tr>
            <td align="center"><button id="tdop_prev_bottom_up">prev</button></td>
            <td align="center"><button id="tdop_reset_bottom_up">reset</button></td>
            <td align="center"><button id="tdop_next_bottom_up">next</button></td>
        </tr>
    </table>
    <script>
    $("#tdop_bottom_up").imageSlider(11, "tdop_prev_bottom_up", "tdop_reset_bottom_up", "tdop_next_bottom_up");
    </script>
</div>
<h3>Left-Corner Parsing</h3>
<p>
    <a href="http://cs.union.edu/~striegnk/courses/nlp-with-prolog/html/node53.html" title="Left-corner parsing">Left-corner parsing</a> (LC) is a parsing technique that makes decisions based on top-down and bottom-up information.
</p>
<p>
    In the case of the bottom-up parser above, it appears that we were lucky that the sequence of shifts and reductions ended up reducing the entire string to an <tt>E</tt>. Strictly speaking, the goal of the above bottom-up parser was exactly that: reduce a string to <tt>E</tt>. If our expression were very long, then it wouldn&apos;t be clear until near the end of a bottom-up parse that our parser might have a chance of reaching its goal of <tt>E</tt>. 
</p>
<p>
    An LC parser attempts to satisfy multiple goals, including the end goal of reducing the string to <tt>E</tt>. An LC parser predicts substructures present in the remainder of the string, and attempts to parse those sub-structures bottom-up. But the prediction step sets up expectations about the structure of unseen parts of the string, which is a top-down approach.
</p>
<p>
    In fact, LC parsers alternate between bottom-up and top-down parsing. Alternation is possible because an LC parser maintains a list of goals (analogous to our top-down expectations), a list of predictions, and a partial parse of the input string (as in a bottom-up parser). An LC parser operates on its input string and these three lists in the following way:
</p>
<ol>
    <li>
        Repeat:
        <ol>
            <li>
                If the head of the goal list is a terminal, then <strong>consume</strong> the terminal and shift the first symbol of the remainder of the input string onto the end of the partial parse. If the goal terminal does not match the first symbol of the string then reject.
                <br><br>
                If the head of the goal list is a non-terminal, then attempt to <strong>reduce</strong> a suffix of the partial parse to the to the goal non-terminal. If such a reduction is possible, then remove the non-terminal from the head of goal list and update the partial parse accordingly.
                <br><br>
                This step is repeated until the goal list remains unchanged.
            </li>
            <li>
                If <em>&beta;</em> is the last symbol of the partial parse, then find a production of the form &quot;<em>&alpha; &rarr; &beta; &gamma;</em>&quot; where <em>&gamma;</em> is a string of zero-or-more symbols. <em>&beta;</em> is said to be a <strong>left corner</strong> of <em>&alpha;</em>. Left corners can be both terminals and non-terminals. If we weren&apos;t restricting ourselves to &epsilon;-free CFGs, then left corners do not necessarily appear immediately following the &quot;&rarr;&quot;!
                <br><br>
                Place <em>&gamma;</em> and <em>&alpha;</em> on the head of the goal list, so that the first symbol (if any) of <em>&gamma;</em> is our next goal.
                <br><br>
                If the goals list is changed then return to the step 1.1.
            </li>
            <li>
                If neither of the previous two steps changed the goals list, then <strong>shift</strong> a symbol from the remainder of the input string onto the end of the partial parse.
                <br><br>
                If no such symbol can be shifted, then reject the string. Otherwise, return to step 1.2.
            </li>
        </ol>
    </li>
    <li>
        Stop when the goal list is empty.
    </li>
</ol>
<p>
    For example, suppose we want to parse &quot;<tt>(2 &times; 3)</tt>&quot;. Parsing will proceed as follows:
</p>
<div align="center">
    <table border="1" cellpadding="0" cellspacing="0">
        <thead>
            <th>Step</th>
            <th colspan="2">Action</th>
            <th>Goals</th>
            <th>Partial parse</th>
            <th>Remainder of string</th>
        </thead>
        <tbody>
        <tr>
            <td>1</td>
            <td colspan="2">start</td>
            <td><tt></tt></td>
            <td></td>
            <td><tt>(2 &times; 3)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">2</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list<br>
                    (2.2) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            
            <td colspan="2">shift</td>
            <td><tt></tt></td>
            <td><tt>&quot;(&quot;</tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">3</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            
            <td>corner</td>
            <td><tt><font color="red">E</font> &rarr; <font color="blue">&quot;(&quot;</font> <font color="green">E &quot;)&quot;</font></tt></td>
            <td><tt><font color="green">E &quot;)&quot;</font> <font color="red">E</font> </tt></td>
            <td><tt><font color="blue">&quot;(&quot;</font></tt></td>
            <td><tt>2 &times; 3)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">4</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list<br>
                    (1.2) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            
            <td>shift</td>
            <td><tt><font color="blue">&quot;2&quot;</font></tt></td>
            <td><tt>E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">&quot;2&quot;</font></tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">5</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            
            <td>corner</td>
            <td><tt><font color="red">N</font> &rarr; <font color="blue">&quot;2&quot;</font></tt></td>
            <td><tt><font color="red">N</font> E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">&quot;2&quot;</font></tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        
        <tr>
            <td>6</td>
            <td>reduce</td>
            <td><tt><font color="blue">N</font> &rarr; &quot;2&quot;</tt></td>
            <td><tt>E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">N</font></tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">7</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            <td>corner</td>
            <td><tt><font color="red">M</font> &rarr; <font color="blue">N</font> <font color="green">&quot;&times;&quot; M</font></tt></td>
            <td><nobr><tt><font color="green">&quot;&times;&quot; M</font> <font color="red">M</font> E &quot;)&quot; E</tt></nobr></td>
            <td><tt>&quot;(&quot; N <font color="blue"></font></tt></td>
            <td><tt>&times; 3)</tt></td>
        </tr>
        
        <tr>
            <td>8</td>
            <td>consume</td>
            <td><tt><font color="blue">&quot;&times;&quot;</font></tt></td>
            <td><tt>M M E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; N <font color="blue">&quot;&times;&quot;</font></tt></td>
            <td><tt>3)</tt></td>
        </tr>

        <tr>
            <td rowspan="2">9</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list<br>
                    (1.2) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            <td>shift</td>
            <td><tt><font color="blue">&quot;3&quot;</font></tt></td>
            <td><tt>M M E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">&quot;3&quot;</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        </tbody>
        <thead>
            <th>Step</th>
            <th colspan="2">Action</th>
            <th>Goals</th>
            <th>Partial parse</th>
            <th>Remainder of string</th>
        </thead>
        <tbody>
        <tr>
            <td rowspan="2">10</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            <td>corner</td>
            <td><tt><font color="red">N</font> &rarr; <font color="blue">&quot;3&quot;</font></tt></td>
            <td><tt><font color="red">N</font> M M E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">&quot;3&quot;</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td>11</td>
            <td>reduce</td>
            <td><tt><font color="blue">N</font> &rarr; &quot;3&quot;</tt></td>
            <td><tt>M M E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">N</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td>12</td>
            <td>reduce</td>
            <td><tt><font color="blue">M</font> &rarr; N</tt></td>
            <td><tt>M E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; N &quot;&times;&quot; <font color="blue">M</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td>13</td>
            <td>reduce</td>
            <td><tt><font color="blue">M</font> &rarr; N &quot;&times;&quot; M</tt></td>
            <td><tt> E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">M</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td rowspan="2">14</td>
            <td colspan="5">
                <small>
                    (1.1) no change to goals list
                </small>
            </td>
        </tr>
        <tr>
            <td>corner</td>
            <td><tt><font color="red">A</font> &rarr; <font color="blue">M</font></tt></td>
            <td><tt><font color="red">A</font> E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">M</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td>15</td>
            <td>reduce</td>
            <td><tt><font color="blue">A</font> &rarr; M</tt></td>
            <td><tt>E &quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">A</font></tt></td>
            <td><tt>)</tt></td>
        </tr>
        
        <tr>
            <td>16</td>
            <td>reduce</td>
            <td><tt><font color="blue">E</font> &rarr; A</tt></td>
            <td><tt>&quot;)&quot; E</tt></td>
            <td><tt>&quot;(&quot; <font color="blue">E</font></tt></td>
            <td><tt>)</tt></td>
        </tr>

        <tr>
            <td>17</td>
            <td>consume</td>
            <td><tt><font color="blue">&quot;)&quot;</font></tt></td>
            <td><tt>E</tt></td>
            <td><tt>&quot;(&quot; E <font color="blue">&quot;)&quot;</font></tt></td>
            <td></td>
        </tr>

        <tr>
            <td rowspan="2">18</td>
            <td>reduce</td>
            <td><nobr><tt><font color="blue">E</font> &rarr; &quot;(&quot; E &quot;)&quot;</tt></nobr></td>
            <td></td>
            <td><tt><font color="blue">E</font></tt></td>
            <td></td>
        </tr>
        <tr>
            <td colspan="5">accept</td>
        </tr>
        </tbody>
    </table>
</div>
<p>
    If&mdash;as a side-effect of parsing a string&mdash;one wanted to build a parse tree, then the order of constructing nodes in the parse tree would be as follows:
</p>
<div align="center">
    <table border="1" cellpadding="0" cellspacing="0">
        <tr>
            <td colspan="3">
                <img src="http://www.ioreader.com/images/tdop/left-corner-step0.png" id="tdop_lc">
            </td>
        </tr>
        <tr>
            <td align="center"><button id="tdop_prev_lc">prev</button></td>
            <td align="center"><button id="tdop_reset_lc">reset</button></td>
            <td align="center"><button id="tdop_next_lc">next</button></td>
        </tr>
    </table>
    <script>
    $("#tdop_lc").imageSlider(17, "tdop_prev_lc", "tdop_reset_lc", "tdop_next_lc");
    </script>
</div>
<p>
    Compared to the other two methods, this seems like a lot of work for nothing! Also, there is some amount of magic happening: recall that we are operating under the assumption that every action taken will be the correct one. In practice, one constructs a table and &quot;cheats&quot; when deciding which actions to take.
</p>
<h3>Summary</h3>
<p>
    Top-down and bottom-up parsing were covered to set the stage for left-corner parsing and the TDPL, which provide context for the behavior of TDOP parsers. My next post will go into TDOP and how it relates to left-corner parsing and the TDPL.
</p>]]></description>
    </item>
        <item>
                <title><![CDATA[Symbolic Interpretation]]></title>
        <link>http://www.ioreader.com/2012/04/07/symbolic-interpretation</link>
        <comments>http://www.ioreader.com/2012/04/07/symbolic-interpretation#comments</comments>
        <pubDate>Sat, 07 Apr 2012 23:09:05 GMT</pubDate>
        <dc:creator>Peter Goodman</dc:creator>
                        <category><![CDATA[Compilers]]></category>
                                <category><![CDATA[Interpreters]]></category>
                                <category><![CDATA[Optimization]]></category>
                        <guid isPermaLink="false">2q</guid>
        <description><![CDATA[<p>
Recently I worked on a project for my <a href="http://www.eecg.toronto.edu/~tsa/homepage/TeachingPage.htm">Optimizing Compilers course</a>. The purpose of this project was to implement <a href="http://en.wikipedia.org/wiki/Loop-invariant_code_motion">Loop-invariant Code Motion</a> and any other compiler optimizations that we choose. The project is competitive because one's mark is based on how one's compiler improves the mean execution time on a small set of static, pre-determined test cases. Given that the test cases do not change, it is natural to specialize one's optimizations to the code being tested. Realistically, this might not be the best approach as code tends to change and compiler optimizations are not always transparent.
</p>

<h3>Optimizations</h3>
<p>
So far I have implemented the following optimizations. This post will focus on the last optimization, symbolic interpretation (labeled EVAL).
</p>
<dl>
	<dt>CP</dt>
	<dd><a href="http://en.wikipedia.org/wiki/Copy_propagation" title="Copy propagation compiler optimization">Copy propagation</a></dd>

	<dt>CF</dt>
	<dd><a href="http://en.wikipedia.org/wiki/Constant_folding" title="Constant folding compiler optimization">Constant folding</a> (with local constant propagation)</dd>

	<dt>LICM</dt>
	<dd><a href="http://en.wikipedia.org/wiki/Loop-invariant_code_motion" title="Loop-invariant code motion compiler optimization">Loop-invariant code motion</a></dd>

	<dt>DCE</dt>
	<dd><a href="http://en.wikipedia.org/wiki/Dead_code_elimination" title="Dead code elimination compiler optimization">Dead code elimination</a> (with unreachable code elimination, block merging, and local constant de-duplication)</dd>

	<dt>CSE</dt>
	<dd><a href="http://en.wikipedia.org/wiki/Common_subexpression_elimination" title="Common subexpression elimination compiler optimization">Common subexpression elimination</a></dd>

	<dt>EVAL</dt>
	<dd>Symbolic interpretation (based on <a href="http://en.wikipedia.org/wiki/Abstract_interpretation" title="Abstract interpretation">abstract interpretation</a>)</dd>
</dl>
<p>
These optimizations were arranged into the following pipeline, where dashed edges are followed when a pass changes something and solid edges are followed when no changes are made:
</p>
<p align="center">
<img src="http://www.ioreader.com/images/optimizer-pipeline.png" alt="Pipeline of optimization passes" />
</p>
<h3>SimpleSUIF</h3>
<p>
This project uses Stanford's <a href="http://suif.stanford.edu/suif/suif1/" title="The SUIF 1.x Compiler System">SimpleSUIF</a> compiler infrastructure. SimpleSUIF's intermediate representation (<abbr title="Intermediate Representation">IR</abbr>) is a linked list of instructions, including such things as basic arithmetic, bitwise operators, memory/constant load/store, and calling/branching operations. The IR is register based, with three register classes: machine, pseudo, and temporary. For our purposes, machine registers are never used. Temporary registers represent single-definition and single-use registers, where both the definition and use (if any) must reside in the same <a href="http://en.wikipedia.org/wiki/Basic_block" title="Basic block">basic block</a>. Temporary registers often hold loaded constants. Pseudo registers behave like general purpose registers. Finally, all registers are typed.
</p>
<p>
One quirk of how we use SimpleSUIF is that there is no apparent way to access the IR for an arbitrary function within the same compilation unit. As such, <a href="http://en.wikipedia.org/wiki/Interprocedural_optimization" title="Interprocedural compiler optimization">interprocedural optimizations</a> such as function inlining and compile-time execution are not possible. This was unfortunate as there was one particular test case that would have benefitted from interprocedural optimization.
</p>
<h3>Test case</h3>
<p>
Below is one of the functions in the test case of interest. Two lines are striked out because the dead code elimination optimization pass regards them as useless.
</p>
<pre class="code">
float f1(float b, float c){
   int i;
   float j, k;

   <strike>j = c;</strike>
   for(i = 0; i &lt; 2; i++) {
      k = b * i;
      <strike>j += k;</strike>
   }

   return k;
}
</pre>
<p>
Looking closely at this example, it is clear that only the initialization of <tt>i</tt> to <tt>0</tt>, the last iteration of the loop, and the value of <tt>b</tt> are important to the output of <tt>f1</tt>. However, this is difficult to tell from the perspective of the IR without running through the program. With more information (e.g. about loop induction variables or loop dependencies), we might be able to make smarter decision, but only in some really restricted cases. Unfortunately, it's not clear <em>how</em> one should go about &quot;executing&quot; this program in the absence of a particular value for <tt>b</tt>. This is where symbolic interpretation comes in.
</p>
<h3>Symbolic interpretation</h3>
<p>
Symbolic interpretation is similar to <a href="http://en.wikipedia.org/wiki/Global_value_numbering" title="Local value numbering">local value numbering</a> in that we operate on concrete and symbolic values. For simplicity, I restricted this optimization pass to a subset of the provably <a href="http://en.wikipedia.org/wiki/Pure_function" title="Pure functions">pure functions</a>. Because information about other functions was absent, I considered a pure function to be any function that does not:
</p>
<ul>
<li>Load from or store to a memory location.</li>
<li>Call any functions. Note: this constraint can be relaxed in the case of a recursive function call. The test cases I focused on did not include recursive function calls; however, this method can easily be extended to apply to that case.</li>
<li>Copy from one memory location to another memory location.</li>
</ul>
<p>
Thus, a function is considered pure if it depends only on constants, local variables, and function arguments, and performs no operation that could generate a side-effect.
</p>
<p>
The following control-flow graph (does not include some edges because I am lazy with SVG) is an interactive symbolic executor of the SimpleSUIF-like IR representing the above function. Below I describe how each step of the evaluator is performed.
</p>
<div align="center" style="display:none;" id="si_table">
  <table border="1">
    <tr>
      <td colspan="2" align="center">
        <button id="si_next">Next</button>
        <button id="si_reset">Reset</button>
      </td>
    </tr>
    <tr>
      <td id="si_state" width="110"></td>
      <td height="400" width="240"><div id="si_blocks"></div></td>
    </tr>
  </table>
<script type="text/javascript" src="http://www.ioreader.com/js/d3.v2.min.js"></script>
<script type="text/javascript" src="http://www.ioreader.com/js/si.js"></script>
</div>
<noscript>
<p>There is supposed to be a cool symbolic interpreter simulator here but javascript is required to see it.</p>
</noscript>
<p>
The symbolic interpreter behaves similarly to something that performs a combination of constant folding and constant propagation, with the exception that when an operation is performed on an expression containing a symbol, a new symbol is generated.
</p>
<p>For example, if one performs an <tt>ldc</tt> operation to load the constant <tt>0</tt> into register <tt>t6</tt>, then we can assign to <tt>t6</tt> the value <tt>0</tt>. If a copy (<tt>cpy</tt>) operation is performed, then the value of the right-hand register is assigned to be the new value of the left-hand register. For example, <tt>cpy r3 = t6</tt> assigns to <tt>r3</tt> the value <tt>0</tt>.
</p>
<p>
Sometimes a register is used before it is defined. For example, <tt>r1</tt> in <tt>mul t8 = r1, r3</tt> is never defined in the above code. This is because <tt>r1</tt> represents one of the arguments to the function. In this case, <tt>r1</tt> is given a new symbolic value that is distinct from every other symbolic value. In the above simulator, the symbolic value assigned to <tt>r1</tt> is named <tt>r1</tt>. The purpose of being able to identify the &quot;origin&quot; of a symbol value will be useful for code generation.
</p>
<p>
When a symbolic value participates in an expression, as in <tt>mul t8 = r1, r3</tt>, a new and unique symbolic value is generated that represents the expression. If any of the components of the expression are constants (known at compile time) then we want to store those constants as part of the symbolic expression. For example, in the first iteration of the loop, <tt>t8</tt> is assigned the symbolic expression <tt>r1 * 0</tt>. In the second iteration of the loop, <tt>t8</tt> is assigned the symbolic expression <tt>r1 * 1</tt>.
</p>
<p>
Something not touched on in this example is a branch that depends on a symbolic value. In this case, we cannot follow the branch as we don't know in which direction it will go at runtime. We are concerned with cases in which we can <em>statically</em> determine the direction of the branch.
</p>
<h3>Code Generation</h3>
<p>
The focus of symbolic evaluation has been to end up with some symbolic or constant expression for each register. In fact, for this optimization, only the returned register (<tt>r5</tt>) ends up being useful. If the returned register contained a constant value then the function is necessarily constant, and so the function's code can be replaced with a <tt>ldc</tt> followed by a <tt>ret</tt>.
</p>
<p>
In the case that the returned register is a symbolic expression, we can walk the expression tree and output for each subexpression the instructions needed to compute that subexpression. The leaves of the expression tree will be symbolic register values (named according to their register) or constants.
</p>
<p>
Using the above expression tree walking strategy, the symbolic expression of <tt>r5</tt> can be converted to the following sequence of instructions:
</p>
<pre class="code">
ldc t1 = 1
mul t2 = r1, t1
ret t2
</pre>
<p>
Here we have generated new registers to hold temporaries, but left symbolic registers alone. This new sequence of instructions takes the place of the old, larger sequence of instruction.
</p>]]></description>
    </item>
        <item>
                <title><![CDATA[Dr. Sheng Yu]]></title>
        <link>http://www.ioreader.com/2012/01/26/dr-sheng-yu</link>
        <comments>http://www.ioreader.com/2012/01/26/dr-sheng-yu#comments</comments>
        <pubDate>Fri, 27 Jan 2012 05:52:46 GMT</pubDate>
        <dc:creator>Peter Goodman</dc:creator>
                                <guid isPermaLink="false">2p</guid>
        <description><![CDATA[<p>
It is with great sadness that I report the passing of my friend, colleague, and mentor: <a href="http://www.csd.uwo.ca/People/sheng_yu.html">Dr. Sheng Yu</a>. I knew Sheng in the past three and a half years of his life. Sheng was twice my professor, twice my employer, and my undergraduate thesis supervisor. 
</p>
<p>
Often I would pop in to Sheng's office on the third floor of Middlesex College at The University of Western Ontario. On his desks were towers of books and papers; it baffled me that they never fell. In his office, we would talk--sometimes for hours--about his past students and what they were up to, about parsing techniques, finite automata, regular languages and their operations, and object-oriented programming.
</p>
<p>
I prefaced each of our e-mail correspondences with the far too formal "Prof. Sheng Yu". Goodbye Prof. Sheng Yu; you will be missed.
</p>]]></description>
    </item>
    </channel>
</rss>
