I’m not a fan of how, currently, all the submodules of crate::parser
can access the internals of our parser. This access makes sense for marker.rs
, but not for, say, expr.rs
. These other modules should have to go through the parser’s API. We can enforce these sorts of visibility issues by breaking our project up into crates.
Extracting the lexer
The first part of Eldiro we’ll create a separate crate for is the lexer.
$ cargo new --lib crates/lexer
Created library `crates/lexer` package
$ git mv -f crates/{eldiro/src/lexer.rs,lexer/src/lib.rs}
We also need to transfer the Logos dependency from eldiro
’s Cargo.toml
to lexer
’s:
# crates/eldiro/Cargo.toml
[dependencies]
drop_bomb = "0.1.5"
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"
# crates/lexer/Cargo.toml
[dependencies]
logos = "0.11.4"
crates/eldiro/src/lib.rs
is still referring to the lexer
module; let’s remove it:
pub mod parser;
mod syntax;
Since naming should show that the lexer is separate from the syntax tree, rename lexer::SyntaxKind
to TokenKind
. We should remove the node types from TokenKind
for the same reason:
// crates/lexer/src/lib.rs
use logos::Logos;
// snip
#[derive(Debug, Copy, Clone, PartialEq, Logos)]
pub(crate) enum TokenKind {
#[regex("[ \n]+")]
Whitespace,
#[token("fn")]
FnKw,
#[token("let")]
LetKw,
#[regex("[A-Za-z][A-Za-z0-9]*")]
Ident,
#[regex("[0-9]+")]
Number,
#[token("+")]
Plus,
#[token("-")]
Minus,
#[token("*")]
Star,
#[token("/")]
Slash,
#[token("=")]
Equals,
#[token("(")]
LParen,
#[token(")")]
RParen,
#[token("{")]
LBrace,
#[token("}")]
RBrace,
#[regex("#.*")]
Comment,
#[error]
Error,
}
Delete TokenKind::is_trivia
, since the lexer should know nothing about parsing (which trivia certainly is a part of). To allow other code to use our lexer, replace all occurrences of pub(crate)
with pub
in crates/lexer/src/lib.rs
.
Let’s move TokenKind
-related code into a separate module, as lib.rs
is starting to get cluttered:
mod token_kind;
pub use token_kind::TokenKind;
// crates/lexer/src/token_kind.rs
use logos::Logos;
#[derive(Debug, Copy, Clone, PartialEq, Logos)]
pub enum TokenKind {
// snip
}
#[cfg(test)]
mod tests {
use super::*;
use crate::{Lexer, Token};
// snip
}
Now that that’s done we should make eldiro
depend on lexer
:
# crates/eldiro/Cargo.toml
[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"
Run a project-wide search and replace from crate::lexer
to lexer
; this will replace all references to the now-deleted lexer
module with ones to the new lexer
crate.
Let’s define SyntaxKind
:
// syntax.rs
use lexer::TokenKind;
use num_derive::{FromPrimitive, ToPrimitive};
use num_traits::{FromPrimitive, ToPrimitive};
#[derive(Debug, Copy, Clone, PartialEq, FromPrimitive, ToPrimitive)]
pub(crate) enum SyntaxKind {
Whitespace,
FnKw,
LetKw,
Ident,
Number,
Plus,
Minus,
Star,
Slash,
Equals,
LParen,
RParen,
LBrace,
RBrace,
Comment,
Error,
Root,
InfixExpr,
Literal,
ParenExpr,
PrefixExpr,
VariableRef,
}
impl SyntaxKind {
pub(crate) fn is_trivia(self) -> bool {
matches!(self, Self::Whitespace | Self::Comment)
}
}
impl From<TokenKind> for SyntaxKind {
fn from(token_kind: TokenKind) -> Self {
match token_kind {
TokenKind::Whitespace => Self::Whitespace,
TokenKind::FnKw => Self::FnKw,
TokenKind::LetKw => Self::LetKw,
TokenKind::Ident => Self::Ident,
TokenKind::Number => Self::Number,
TokenKind::Plus => Self::Plus,
TokenKind::Minus => Self::Minus,
TokenKind::Star => Self::Star,
TokenKind::Slash => Self::Slash,
TokenKind::Equals => Self::Equals,
TokenKind::LParen => Self::LParen,
TokenKind::RParen => Self::RParen,
TokenKind::LBrace => Self::LBrace,
TokenKind::RBrace => Self::RBrace,
TokenKind::Comment => Self::Comment,
TokenKind::Error => Self::Error,
}
}
}
Note how we define a conversion from lexer
’s TokenKind
to SyntaxKind
. The next step is to import SyntaxKind
from crate::syntax
instead of lexer
, making sure to add any conversions where necessary:
// parser.rs
use crate::syntax::{SyntaxKind, SyntaxNode};
use event::Event;
use expr::expr;
use lexer::{Lexer, Token};
use marker::Marker;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
// event.rs
use crate::syntax::SyntaxKind;
#[derive(Debug, PartialEq)]
pub(super) enum Event {
// snip
}
// expr.rs
use super::marker::CompletedMarker;
use super::Parser;
use crate::syntax::SyntaxKind;
// marker.rs
use super::event::Event;
use super::Parser;
use crate::syntax::SyntaxKind;
use drop_bomb::DropBomb;
// sink.rs
use super::event::Event;
use crate::syntax::{EldiroLanguage, SyntaxKind};
use lexer::Token;
use rowan::{GreenNode, GreenNodeBuilder, Language};
use std::mem;
// snip
impl<'t, 'input> Sink<'t, 'input> {
// snip
fn eat_trivia(&mut self) {
while let Some(token) = self.tokens.get(self.cursor) {
if !SyntaxKind::from(token.kind).is_trivia() {
break;
}
self.token();
}
}
fn token(&mut self) {
let Token { kind, text } = self.tokens[self.cursor];
self.builder
.token(EldiroLanguage::kind_to_raw(kind.into()), text.into());
self.cursor += 1;
}
}
// source.rs
use crate::syntax::SyntaxKind;
use lexer::Token;
// snip
impl<'t, 'input> Source<'t, 'input> {
// snip
fn peek_kind_raw(&self) -> Option<SyntaxKind> {
self.tokens
.get(self.cursor)
.map(|Token { kind, .. }| (*kind).into())
}
}
Running our tests shows how they’re split across crates:
$ cargo t -q --lib
running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Extracting syntax
Next up: the syntax
module. Let’s again generate a new crate and copy over the module:
$ cargo new --lib crates/syntax
Created library `crates/syntax` package
$ git mv -f crates/{eldiro/src/syntax.rs,syntax/src/lib.rs}
Let’s add all its dependencies:
# crates/syntax/Cargo.toml
[dependencies]
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"
We can now remove num-derive and num-traits from eldiro
’s Cargo.toml
, while also adding syntax
:
# crates/eldiro/Cargo.toml
[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.10.0"
syntax = {path = "../syntax"}
Search and replace pub(crate)
with pub
in crates/syntax/src/lib.rs
to make all its items visible to the outside world.
We also need to remove the relevant mod
declaration from crates/eldiro/src/lib.rs
, which now looks like this:
pub mod parser;
Yet again, run a project-wide search and replace from crate::syntax
to syntax
.
$ cargo t -q --lib
running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Extracting the parser
This is the last crate we’ll extract, so let’s get on with it:
$ cargo new --lib crates/parser
Created library `crates/parser` package
$ git mv -f crates/{eldiro/src/parser.rs,parser/src/lib.rs}
$ git mv -f crates/{eldiro/src/parser/*,parser/src}
$ rmdir crates/eldiro/src/parser
# crates/parser/Cargo.toml
[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.10.0"
syntax = {path = "../syntax"}
[dev-dependencies]
expect-test = "1.0.1"
Run a project-wide search and replace from pub(super)
to pub(crate)
, since all the occurrences of pub(super)
are on items in top-level modules of the parser
crate. As such, using pub(super)
on these items is equivalent to pub(crate)
, with pub(crate)
being the more idiomatic choice.
To prevent all code in the parser
crate from being able to see inside of Parser
, we’ll move it into its own module.
// lib.rs
mod event;
mod expr;
mod marker;
mod parser;
mod sink;
mod source;
use lexer::Lexer;
use parser::Parser;
use rowan::GreenNode;
use sink::Sink;
use syntax::SyntaxNode;
pub fn parse(input: &str) -> Parse {
// snip
}
pub struct Parse {
green_node: GreenNode,
}
impl Parse {
// snip
}
#[cfg(test)]
fn check(input: &str, expected_tree: expect_test::Expect) {
let parse = parse(input);
expected_tree.assert_eq(&parse.debug_tree());
}
// crates/parser/src/parser.rs
use crate::event::Event;
use crate::expr::expr;
use crate::marker::Marker;
use crate::source::Source;
use lexer::Token;
use syntax::SyntaxKind;
pub(crate) struct Parser<'t, 'input> {
source: Source<'t, 'input>,
events: Vec<Event>,
}
impl<'t, 'input> Parser<'t, 'input> {
pub(crate) fn new(tokens: &'t [Token<'input>]) -> Self {
// snip
}
pub(crate) fn parse(mut self) -> Vec<Event> {
// snip
}
pub(crate) fn start(&mut self) -> Marker {
// snip
}
pub(crate) fn bump(&mut self) {
// snip
}
pub(crate) fn at(&mut self, kind: SyntaxKind) -> bool {
// snip
}
pub(crate) fn peek(&mut self) -> Option<SyntaxKind> {
// snip
}
}
#[cfg(test)]
mod tests {
use crate::check;
use expect_test::expect;
// snip
}
Now Marker
and CompletedMarker
can’t see inside of Parser
, even though they need to be able to. Let’s move the marker
module into parser
:
$ mkdir crates/parser/src/parser
$ git mv crates/parser/src/{marker.rs,parser}
// lib.rs
mod event;
mod expr;
mod parser;
mod sink;
mod source;
// parser.rs
pub(crate) mod marker;
use crate::event::Event;
use crate::expr::expr;
use crate::source::Source;
use lexer::Token;
use marker::Marker;
use syntax::SyntaxKind;
We need to update the imports in marker
:
// marker.rs
use super::Parser;
use crate::event::Event;
use drop_bomb::DropBomb;
use syntax::SyntaxKind;
There are some errors in expr.rs
about an unresolved import that we can fix:
// expr.rs
use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;
Let’s modify Parser::new
so that it doesn’t have to know how to create a Source
:
// parser.rs
pub(crate) mod marker;
use crate::event::Event;
use crate::expr::expr;
use crate::source::Source;
use marker::Marker;
use syntax::SyntaxKind;
// snip
impl<'t, 'input> Parser<'t, 'input> {
pub(crate) fn new(source: Source<'t, 'input>) -> Self {
Self {
source,
events: Vec::new(),
}
}
// snip
}
This obligates us to update the main parse
function:
// lib.rs
mod event;
mod expr;
mod parser;
mod sink;
mod source;
use lexer::Lexer;
use parser::Parser;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
use syntax::SyntaxNode;
pub fn parse(input: &str) -> Parse {
let tokens: Vec<_> = Lexer::new(input).collect();
let source = Source::new(&tokens);
let parser = Parser::new(source);
let events = parser.parse();
let sink = Sink::new(&tokens, events);
Parse {
green_node: sink.finish(),
}
}
Next, we’ll create a new module, grammar
, and move expr
into it:
// lib.rs
mod event;
mod grammar;
mod parser;
mod sink;
mod source;
// crates/parser/src/grammar.rs
mod expr;
pub(crate) use expr::expr;
$ mkdir crates/parser/src/grammar
$ git mv crates/parser/src/{expr.rs,grammar}
Let’s fix the imports in expr.rs
:
#[cfg(test)]
mod tests {
use crate::check;
use expect_test::expect;
// snip
}
As we add more and more features to Eldiro, we’ll end up with more and more submodules in grammar
. It would be annoying to have the same three lines of imports in each one, so we can move them into grammar
:
// grammar.rs
mod expr;
pub(crate) use expr::expr;
use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;
And import them from expr.rs
:
use super::*;
Let’s fix the import of expr
in parser.rs
:
pub(crate) mod marker;
use crate::event::Event;
use crate::grammar;
use crate::source::Source;
use marker::Marker;
use syntax::SyntaxKind;
// snip
impl<'t, 'input> Parser<'t, 'input> {
// snip
pub(crate) fn parse(mut self) -> Vec<Event> {
let m = self.start();
grammar::expr(&mut self);
m.complete(&mut self, SyntaxKind::Root);
self.events
}
// snip
}
It might be nice to extract that little dance we do with SyntaxKind::Root
in Parser::parse
into its own function:
// grammar.rs
mod expr;
use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;
pub(crate) fn root(p: &mut Parser) -> CompletedMarker {
let m = p.start();
expr::expr(p);
m.complete(p, SyntaxKind::Root)
}
Now grammar::expr::expr
doesn’t need to be pub(crate)
:
// expr.rs
pub(super) fn expr(p: &mut Parser) {
expr_binding_power(p, 0);
}
Let’s use that root
function from grammar
in parser
:
// parser.rs
impl<'t, 'input> Parser<'t, 'input> {
// snip
pub(crate) fn parse(mut self) -> Vec<Event> {
grammar::root(&mut self);
self.events
}
// snip
}
Finally, let’s remove the parser
module declaration from crates/eldiro/src/lib.rs
:
// It’s empty!
To make everything compile we can re-export parser::parse
from eldiro
:
pub mod parser {
pub use parser::parse;
}
# crates/eldiro/Cargo.toml
[dependencies]
parser = {path = "../parser"}
$ cargo t -q --lib
running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
In reality, though, there isn’t a need anymore for the eldiro
crate. Let’s delete it:
$ rm -r crates/eldiro
And update eldiro-cli
:
# crates/eldiro-cli/Cargo.toml
[dependencies]
parser = {path = "../parser"}
// crates/eldiro-cli/src/main.rs
use parser::parse;
use std::io::{self, Write};
Now that the eldiro
crate doesn’t exist anymore, we can rename eldiro-cli
to eldiro
:
$ git mv crates/{eldiro-cli,eldiro}
# crates/eldiro/Cargo.toml
[package]
name = "eldiro"
$ cargo t -q --lib
running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
A random note
Rowan parsers tend to return a CompletedMarker
from subparsers, as that gives the parsers that call them the opportunity to call precede
. Let’s follow this convention and change the only subparsers that don’t return a CompletedMarker
, expr
and expr_binding_power
:
// expr.rs
pub(super) fn expr(p: &mut Parser) -> Option<CompletedMarker> {
expr_binding_power(p, 0)
}
fn expr_binding_power(p: &mut Parser, minimum_binding_power: u8) -> Option<CompletedMarker> {
let mut lhs = lhs(p)?; // we’ll handle errors later.
loop {
let op = match p.peek() {
// snip
_ => return None, // we’ll handle errors later.
};
// snip
if left_binding_power < minimum_binding_power {
break;
}
// snip
}
Some(lhs)
}
Conclusion
As a reward for our efforts I’ve prepared a diagram of the dependency graph of the eldiro
crate:
$ cargo install cargo-depgraph
$ # install Graphviz however you like
$ cargo depgraph --all-deps | dot -Tsvg > graph.svg
Green indicates a build dependency, meaning a dependency that is needed to build, but doesn’t contribute any code directly. As you can see, the two direct build dependencies that our crates have both provide procedural macros: logos-derive and num-derive.
Blue indicates a development dependency, of which we only have one: expect-test.
In the next part we’ll cover an interesting topic that Rowan is particularly suited to: error recovery, resilience and reporting.