I was disciplined with my TDD throughout the early parts of this series – I think this was helpful. Unfortunately, testing has fallen by the wayside in some recent parts; especially the last one. I would like to get back into the habit of TDD, and so I should add tests to areas of the codebase that are missing them.
A weird compilation failure
First off, a small erratum: it turns out that Eldiro fails to compile when the entire workspace’s tests are run:
$ cargo t
# snip
error[E0659]: `parser` is ambiguous (name vs any other name during import resolution)
--> /home/me/src/eldiro/crates/parser/src/lib.rs:8:5
|
8 | use parser::{ParseError, Parser};
| ^^^^^^ ambiguous name
|
= note: `parser` could refer to a crate passed with `--extern`
= help: use `::parser` to refer to this crate unambiguously
note: `parser` could also refer to the module defined here
--> /home/me/src/eldiro/crates/parser/src/lib.rs:3:1
|
3 | mod parser;
| ^^^^^^^^^^^
= help: use `crate::parser` to refer to this module unambiguously
error: aborting due to previous error
This is because we have both a module and a crate with the same name, so the import is ambiguous. I’m unsure why this error doesn’t appear with cargo build
, but we should fix it anyway:
// crates/parser/src/lib.rs
mod event;
mod grammar;
mod parser;
mod sink;
mod source;
use crate::parser::{ParseError, Parser};
use lexer::Lexer;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
use syntax::SyntaxNode;
Updates to Rowan
Due to a flurry of activity focused on performance, Rowan has had two breaking changes recently. Let’s update both of our crates that depend on Rowan to point to the new version:
# crates/syntax/Cargo.toml
[dependencies]
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.12.1"
# crates/parser/Cargo.toml
[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.12.1"
syntax = {path = "../syntax"}
text-size = "1.0.0"
Thanks to Rust’s static type system and helpful error messages, updating dependencies is a breeze. The only error this change has resulted in is in VariableRef::name
. Rowan switched from returning &SmolStr
from SyntaxToken::text
to returning a simple &str
. Let’s update VariableRef::name
to convert the &str
to a SmolStr
:
// crates/ast/src/lib.rs
impl VariableRef {
pub fn name(&self) -> SmolStr {
self.0.first_token().unwrap().text().into()
}
}
All the other AST methods return SyntaxNode
s, SyntaxToken
s or data types that contain them; let’s return the VariableRef
’s first child token directly for consistency:
impl VariableRef {
pub fn name(&self) -> Option<SyntaxToken> {
self.0.first_token()
}
}
This allows us to remove ast
’s smol_str
dependency:
use syntax::{SyntaxElement, SyntaxKind, SyntaxNode, SyntaxToken};
# Cargo.toml
[dependencies]
syntax = {path = "../syntax"}
Next, we have a usage of VariableRef::name
that we have to update:
// crates/hir/src/database.rs
impl Database {
// snip
pub(crate) fn lower_expr(&mut self, ast: Option<ast::Expr>) -> Expr {
if let Some(ast) = ast {
match ast {
// snip
ast::Expr::VariableRef(ast) => Expr::VariableRef {
var: ast.name().unwrap().text().into(),
},
}
} else {
// snip
}
}
// snip
}
That match arm is longer than one line; let’s extract it to a method for consistency:
impl Database {
// snip
pub(crate) fn lower_expr(&mut self, ast: Option<ast::Expr>) -> Expr {
if let Some(ast) = ast {
match ast {
// snip
ast::Expr::VariableRef(ast) => self.lower_variable_ref(ast),
}
} else {
// snip
}
}
// snip
fn lower_variable_ref(&mut self, ast: ast::VariableRef) -> Expr {
Expr::VariableRef {
var: ast.name().unwrap().text().into(),
}
}
}
Updating Rowan has given us one more error related to SmolStr
vs &str
. It’s a trivial fix – all we have to do is change a .clone()
to an .into()
:
impl Database {
pub(crate) fn lower_stmt(&mut self, ast: ast::Stmt) -> Option<Stmt> {
let result = match ast {
ast::Stmt::VariableDef(ast) => Stmt::VariableDef {
name: ast.name()?.text().into(),
value: self.lower_expr(ast.value()),
},
// snip
};
Some(result)
}
// snip
}
Running cargo clippy
reveals that there is an occurrence where we were converting a &str
to a SmolStr
using .into()
– now that the type has changed to &str
, this .into()
call does not have an effect. Let’s remove it:
// crates/parser/src/sink.rs
impl<'t, 'input> Sink<'t, 'input> {
// snip
fn token(&mut self) {
let Token { kind, text, .. } = self.tokens[self.cursor];
self.builder
.token(EldiroLanguage::kind_to_raw(kind.into()), text);
self.cursor += 1;
}
}
Tests for lowering
Let’s write a test for lowering a variable definition:
// crates/hir/src/database.rs
#[cfg(test)]
mod tests {
use super::*;
fn parse(input: &str) -> ast::Root {
ast::Root::cast(parser::parse(input).syntax()).unwrap()
}
#[test]
fn lower_variable_def() {
let root = parse("let foo = bar");
let ast = root.stmts().next().unwrap();
let hir = Database::default().lower_stmt(ast).unwrap();
assert_eq!(
hir,
Stmt::VariableDef {
name: "foo".into(),
value: Expr::VariableRef { var: "bar".into() },
},
);
}
}
Note how we need a SyntaxNode
to create an ast::Root
, from which we then extract an ast::Stmt
for lowering. We can only create this SyntaxNode
by parsing source code, so we have to depend on parser
:
# Cargo.toml
[dependencies]
ast = {path = "../ast"}
la-arena = "0.2.0"
smol_str = "0.1.17"
syntax = {path = "../syntax"}
[dev-dependencies]
parser = {path = "../parser"}
We have an error about Stmt
not implementing PartialEq
, which we can derive for it and other HIR types:
// lib.rs
#[derive(Debug, PartialEq)]
pub enum Stmt {
// snip
}
#[derive(Debug, PartialEq)]
pub enum Expr {
// snip
}
#[derive(Debug, PartialEq)]
pub enum BinaryOp {
// snip
}
#[derive(Debug, PartialEq)]
pub enum UnaryOp {
// snip
}
$ cargo t -q --lib
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 26 tests
..........................
test result: ok. 26 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Writing tests for each of the other HIR elements will be easier if we write some helper functions:
// database.rs
#[derive(Debug, PartialEq, Default)]
pub struct Database {
// snip
}
#[cfg(test)]
mod tests {
use super::*;
fn parse(input: &str) -> ast::Root {
ast::Root::cast(parser::parse(input).syntax()).unwrap()
}
fn check_stmt(input: &str, expected_hir: Stmt) {
let root = parse(input);
let ast = root.stmts().next().unwrap();
let hir = Database::default().lower_stmt(ast).unwrap();
assert_eq!(hir, expected_hir);
}
fn check_expr(input: &str, expected_hir: Expr, expected_database: Database) {
let root = parse(input);
let first_stmt = root.stmts().next().unwrap();
let ast = match first_stmt {
ast::Stmt::Expr(ast) => ast,
_ => unreachable!(),
};
let mut database = Database::default();
let hir = database.lower_expr(Some(ast));
assert_eq!(hir, expected_hir);
assert_eq!(database, expected_database);
}
#[test]
fn lower_variable_def() {
check_stmt(
"let foo = bar",
Stmt::VariableDef {
name: "foo".into(),
value: Expr::VariableRef { var: "bar".into() },
},
);
}
}
$ cargo t -q -p hir --lib
running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Let’s write tests for each of the remaining HIR elements:
#[cfg(test)]
mod tests {
// snip
#[test]
fn lower_expr_stmt() {
check_stmt("123", Stmt::Expr(Expr::Literal { n: 123 }));
}
#[test]
fn lower_binary_expr() {
let mut exprs = Arena::new();
let lhs = exprs.alloc(Expr::Literal { n: 1 });
let rhs = exprs.alloc(Expr::Literal { n: 2 });
check_expr(
"1 + 2",
Expr::Binary {
lhs,
rhs,
op: BinaryOp::Add,
},
Database { exprs },
);
}
#[test]
fn lower_literal() {
check_expr("999", Expr::Literal { n: 999 }, Database::default());
}
#[test]
fn lower_paren_expr() {
check_expr(
"((((((abc))))))",
Expr::VariableRef { var: "abc".into() },
Database::default(),
);
}
#[test]
fn lower_unary_expr() {
let mut exprs = Arena::new();
let ten = exprs.alloc(Expr::Literal { n: 10 });
check_expr(
"-10",
Expr::Unary {
expr: ten,
op: UnaryOp::Neg,
},
Database { exprs },
);
}
#[test]
fn lower_variable_ref() {
check_expr(
"foo",
Expr::VariableRef { var: "foo".into() },
Database::default(),
);
}
}
$ cargo t -q -p hir --lib
running 7 tests
.......
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
So far our tests have only included the ‘happy path’, or cases where everything has gone as it should. Let’s add some test cases where the input is malformed, to make sure lowering doesn’t break when the user is typing:
#[cfg(test)]
mod tests {
// snip
#[test]
fn lower_variable_def_without_name() {
let root = parse("let = 10");
let ast = root.stmts().next().unwrap();
assert!(Database::default().lower_stmt(ast).is_none());
}
#[test]
fn lower_variable_def_without_value() {
check_stmt(
"let a =",
Stmt::VariableDef {
name: "a".into(),
value: Expr::Missing,
},
);
}
// snip
#[test]
fn lower_binary_expr_without_rhs() {
let mut exprs = Arena::new();
let lhs = exprs.alloc(Expr::Literal { n: 10 });
let rhs = exprs.alloc(Expr::Missing);
check_expr(
"10 -",
Expr::Binary {
lhs,
rhs,
op: BinaryOp::Sub,
},
Database { exprs },
);
}
// snip
#[test]
fn lower_unary_expr_without_expr() {
let mut exprs = Arena::new();
let expr = exprs.alloc(Expr::Missing);
check_expr(
"-",
Expr::Unary {
expr,
op: UnaryOp::Neg,
},
Database { exprs },
);
}
// snip
}
$ cargo t -q -p hir --lib
running 11 tests
...........
test result: ok. 11 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Testing the parser
The parser’s current test suite is quite extensive; however, we might have missed some problematic inputs. The same goes for lowering, too.
A fuzzer is a program that generates random inputs to another program with the purpose of findings bugs in it. This is perfect for testing our parsing and lowering code. Let’s install cargo-fuzz, a tool that makes it easy to run fuzzers on Rust code. Unfortunately, the fuzzing engine we’ll be using, libFuzzer, only supports x86-64 Linux and x86-64 macOS.
$ cargo install cargo-fuzz
Now that cargo-fuzz is installed, we can create a crate to house our fuzzer. We don’t want the fuzzer to be part of the workspace, so we’ll create our fuzzing crate outside of crates/
:
$ cargo new --bin fuzz
warning: compiling this new crate may not work due to invalid workspace configuration
current package believes it's in a workspace when it's not:
current: /home/me/src/eldiro/fuzz/Cargo.toml
workspace: /home/me/src/eldiro/Cargo.toml
this may be fixable by adding `fuzz` to the `workspace.members` array of the manifest located at: /home/me/src/eldiro/Cargo.toml
Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.
Created binary (application) `fuzz` package
Let’s address that warning:
# fuzz/Cargo.toml
[workspace]
cargo-fuzz has the concept of a fuzz target, which is a program that receives random data from the fuzzer and tries to run some other code. Let’s create a fuzz target for parsing and lowering:
$ rm -r fuzz/src
$ mkdir fuzz/fuzz_targets
# fuzz/Cargo.toml
[[bin]]
name = "parse_lower"
path = "fuzz_targets/parse_lower.rs"
To write a fuzz target, we need to depend on libFuzzer:
[dependencies]
libfuzzer-sys = "0.3"
And define it:
// fuzz/fuzz_targets/parse_lower.rs
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
let parse = parser::parse(s);
let root = ast::Root::cast(parse.syntax()).unwrap();
let (_database, _stmts) = hir::lower(root);
}
});
Note how libFuzzer gives us random bytes, and we can only run the parser and lower the syntax tree if those bytes are valid UTF-8. Let’s add those missing dependencies:
# Cargo.toml
[dependencies]
ast = {path = "../crates/ast"}
hir = {path = "../crates/hir"}
libfuzzer-sys = "0.3"
parser = {path = "../crates/parser"}
Finally, to use the fuzz target in our fuzz
crate, we need to declare that the crate is meant to be used with cargo-fuzz:
[package.metadata]
cargo-fuzz = true
cargo-fuzz uses nightly compiler features; we’ll set an override for the fuzz
crate only:
$ rustup override set --path fuzz nightly
info: using existing install for 'nightly-x86_64-apple-darwin'
info: override toolchain for 'fuzz' set to 'nightly-x86_64-apple-darwin'
nightly-x86_64-apple-darwin unchanged - rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
Cool! And, finally, we can run the fuzzer:
$ cd fuzz
$ cargo fuzz run parse_lower
A ton of output will fly across the screen as the fuzzer tries random inputs. The lines starting with NEW
are when the fuzzer is generating new inputs, and the ones starting with REDUCE
are those in which the fuzzer tries to make the input smaller and simpler while covering the same codepaths. libFuzzer is an LLVM project, so it makes sense that it would have deep insights into the program’s execution.
The fuzzer found a bug in around one second of fuzzing on my machine; here’s the report:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: PosOverflow }', /home/me/src/eldiro/crates/ast/src/lib.rs:107:54
# snip backtrace
────────────────────────────────────────────────────────────────────────────────
Failing input:
artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5
Output of `std::fmt::Debug`:
[120, 42, 53, 120, 42, 53, 53, 53, 48, 48, 56, 54, 55, 53, 54, 51, 53, 53, 48, 48, 56, 54, 55, 53, 54, 51, 52, 54, 48, 55, 55, 45, 93, 45]
Reproduce with:
cargo fuzz run parse_lower artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5
Minimize test case with:
cargo fuzz tmin parse_lower artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5
────────────────────────────────────────────────────────────────────────────────
Let’s take a look at the failing input:
$ cat artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5
x*5x*55500867563550086756346077-]-
Let’s take a look at what we know:
- the input has a long sequence of numbers in it
- the panic message mentioned a
ParseIntError
This makes me think that, when we try to parse an integer out of the source code, the standard library’s integer parser realises that the parsed integer would be larger than the integer type’s maximum value. The panic message points to line 107 of crates/ast/src/lib.rs
, which, sure enough, is where we parse integer literals:
impl Literal {
pub fn parse(&self) -> u64 {
self.0.first_token().unwrap().text().parse().unwrap() // line 107
}
}
The solution to this is twofold:
- we should add semantic validation to the AST, so that non-syntactic errors like this can be caught
- analysis needs to continue in spite of these errors, so
ast::Literal::parse
andhir::Expr::Literal
should beOption<u64>
AST validation
It’s great that the fuzzer found this bug for us so quickly! Let’s create a validation
module in the ast
crate:
// crates/ast/src/lib.rs
pub mod validation;
We’ll traverse down into expressions across the entire AST, validating them as we go:
// crates/ast/src/validation.rs
use crate::{Root, Stmt};
pub fn validate(root: Root) {
for stmt in root.stmts() {
match stmt {
Stmt::VariableDef(variable_def) => {
if let Some(e) = variable_def.value() {
validate_expr(e)
}
}
Stmt::Expr(e) => validate_expr(e),
}
}
}
Let’s define validate_expr
:
use crate::{Expr, Root, Stmt};
// snip
fn validate_expr(expr: Expr) {
match expr {
Expr::BinaryExpr(binary_expr) => {
if let Some(e) = binary_expr.lhs() {
validate_expr(e);
}
if let Some(e) = binary_expr.rhs() {
validate_expr(e);
}
}
Expr::Literal(literal) => {
???
}
Expr::ParenExpr(paren_expr) => {
if let Some(e) = paren_expr.expr() {
validate_expr(e);
}
}
Expr::UnaryExpr(unary_expr) => {
if let Some(e) = unary_expr.expr() {
validate_expr(e);
}
}
Expr::VariableRef(_) => {}
}
}
How can we actually check if a number literal is greater than u64::MAX
? We can’t parse the number and then check, because we would then run into the bug we’re trying to fix! Instead, we’ll report an error when parsing the number fails, since we know the only reason parsing can fail is if the number literal is too large:
fn validate_expr(expr: Expr) {
match expr {
// snip
Expr::Literal(literal) => {
if literal.parse().is_none() {
// report error
}
}
// snip
}
}
This forces us to make that other change from earlier:
// lib.rs
impl Literal {
pub fn parse(&self) -> Option<u64> {
self.0.first_token().unwrap().text().parse().ok()
}
}
Let’s update the HIR to store an Option<u64>
so that analysis can continue in spite of the invalid number literal:
// crates/hir/src/lib.rs
#[derive(Debug, PartialEq)]
pub enum Expr {
// snip
Literal {
/// is `None` if the number is too big to fit in a u64
n: Option<u64>,
},
// snip
}
Unfortunately, we have a number of errors from the lowering tests depending on n
being a u64
. The errors are straightforward and just involve wrapping numbers in Some()
, so I won’t show how to solve them here. Take a look at the diff on GitHub if you get stuck.
We need a new type to represent validation errors:
// crates/ast/src/validation.rs
use crate::{Expr, Root, Stmt};
use text_size::TextRange;
#[derive(Debug, PartialEq)]
pub struct ValidationError {
message: String,
range: TextRange,
}
# Cargo.toml
[dependencies]
syntax = {path = "../syntax"}
text-size = "1.1.0"
Let’s store a Vec
of these, pushing to it as we go:
// validation.rs
pub fn validate(root: Root) -> Vec<ValidationError> {
let mut errors = Vec::new();
for stmt in root.stmts() {
match stmt {
Stmt::VariableDef(variable_def) => {
if let Some(e) = variable_def.value() {
validate_expr(e, &mut errors);
}
}
Stmt::Expr(e) => validate_expr(e, &mut errors),
}
}
errors
}
fn validate_expr(expr: Expr, errors: &mut Vec<ValidationError>) {
match expr {
Expr::BinaryExpr(binary_expr) => {
if let Some(e) = binary_expr.lhs() {
validate_expr(e, errors);
}
if let Some(e) = binary_expr.rhs() {
validate_expr(e, errors);
}
}
Expr::Literal(literal) => {
if literal.parse().is_none() {
errors.push(ValidationError {
message: format!(
"number literal is larger than an integer’s maximum value, {}",
u64::MAX,
),
range: literal.0.first_token().unwrap().text_range(),
});
}
}
Expr::ParenExpr(paren_expr) => {
if let Some(e) = paren_expr.expr() {
validate_expr(e, errors);
}
}
Expr::UnaryExpr(unary_expr) => {
if let Some(e) = unary_expr.expr() {
validate_expr(e, errors);
}
}
// snip
}
}
It took a lot of code to walk the entire AST for number literals to validate. It would be easier if we could just search the AST recursively from the top. SyntaxNode
has a descendants
method that does exactly what we want to do – all we need is a way to determine whether a given node is a Literal
and, if it is, validate it.
// lib.rs
impl Literal {
pub fn cast(node: SyntaxNode) -> Option<Self> {
if node.kind() == SyntaxKind::Literal {
Some(Self(node))
} else {
None
}
}
// snip
}
// validation.rs
use crate::Literal;
use syntax::SyntaxNode;
use text_size::TextRange;
// snip
pub fn validate(node: &SyntaxNode) -> Vec<ValidationError> {
let mut errors = Vec::new();
for node in node.descendants() {
if let Some(literal) = Literal::cast(node) {
validate_literal(literal, &mut errors)
}
}
errors
}
fn validate_literal(literal: Literal, errors: &mut Vec<ValidationError>) {
if literal.parse().is_none() {
errors.push(ValidationError {
message: format!(
"number literal is larger than an integer’s maximum value, {}",
u64::MAX,
),
range: literal.0.first_token().unwrap().text_range(),
});
}
}
So much simpler! Let’s write some tests to make sure this is working as expected:1
# Cargo.toml
[dependencies]
syntax = {path = "../syntax"}
text-size = "1.1.0"
[dev-dependencies]
parser = {path = "../parser"}
#[cfg(test)]
mod tests {
use super::*;
use std::ops::Range as StdRange;
fn check(input: &str, expected_errors: &[(&str, StdRange<u32>)]) {
let parse = parser::parse(input);
let expected_errors: Vec<_> = expected_errors
.iter()
.map(|(message, range)| ValidationError {
message: message.to_string(),
range: {
let start = range.start.into();
let end = range.end.into();
TextRange::new(start, end)
},
})
.collect();
assert_eq!(validate(&parse.syntax()), expected_errors);
}
#[test]
fn validate_ok_literal() {
check("123", &[]);
}
#[test]
fn validate_too_large_literal() {
check(
"99999999999999999999",
&[(
"number literal is larger than an integer’s maximum value, 18446744073709551615",
(0..20),
)],
);
}
}
$ cargo t -q -p ast --lib
running 2 tests
..
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
The tests have revealed a problem with our design, though – if we ever decide to change the wording of an error message, we have to update all the tests, too. I think that we shouldn’t test the specific text of the error message, but just that an error that signifies that a number literal is too large has been reported. We can represent this in code using an enum with a Display
implementation:
use crate::Literal;
use std::fmt;
use syntax::SyntaxNode;
use text_size::TextRange;
#[derive(Debug, PartialEq)]
pub struct ValidationError {
kind: ValidationErrorKind,
range: TextRange,
}
#[derive(Debug, Clone, Copy, PartialEq)]
enum ValidationErrorKind {
NumberLiteralTooLarge,
}
impl fmt::Display for ValidationErrorKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::NumberLiteralTooLarge => write!(
f,
"number literal is larger than an integer’s maximum value, {}",
u64::MAX,
),
}
}
}
// snip
fn validate_literal(literal: Literal, errors: &mut Vec<ValidationError>) {
if literal.parse().is_none() {
errors.push(ValidationError {
kind: ValidationErrorKind::NumberLiteralTooLarge,
range: literal.0.first_token().unwrap().text_range(),
});
}
}
Let’s update the tests:
#[cfg(test)]
mod tests {
// snip
fn check(input: &str, expected_errors: &[(ValidationErrorKind, StdRange<u32>)]) {
// snip
let expected_errors: Vec<_> = expected_errors
.iter()
.map(|(kind, range)| ValidationError {
kind: *kind,
range: {
// snip
},
})
.collect();
// snip
}
// snip
#[test]
fn validate_too_large_literal() {
check(
"99999999999999999999",
&[(ValidationErrorKind::NumberLiteralTooLarge, (0..20))],
);
}
}
For the error type we’ve created to be useful, we need to be able to display it to the user. Let’s implement Display
for ValidationError
:
impl fmt::Display for ValidationError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"error at {}..{}: {}",
u32::from(self.range.start()),
u32::from(self.range.end()),
self.kind,
)
}
}
We can try this out in the REPL:
// crates/eldiro/src/main.rs
fn main() -> io::Result<()> {
// snip
loop {
// snip
let parse = parse(&input);
println!("{}", parse.debug_tree());
let syntax = parse.syntax();
for error in ast::validation::validate(&syntax) {
println!("{}", error);
}
let root = ast::Root::cast(syntax).unwrap();
// snip
}
}
$ cargo r -q
→ 123
Root@0..4
Literal@0..4
Number@0..3 "123"
Whitespace@3..4 "\n"
# snip
→ 999999999999999999999999999
Root@0..28
Literal@0..28
Number@0..27 "999999999999999999999 ..."
Whitespace@27..28 "\n"
error at 0..27: number literal is larger than an integer’s maximum value, 18446744073709551615
# snip
Continuing with fuzzing
I’d argue that most bugs in programming language implementations are in edge-cases that differ slightly from real programs. Currently, the fuzzer is just throwing random input at the parser, hoping to find a bug. We can be more strategic by giving the fuzzer an example to work with – the fuzzer will modify and mutate this example randomly, rather than starting from total randomness.
Let’s clear out the existing corpus – inputs the fuzzer has tried and will work from in future:
$ rm fuzz/corpus/parse_lower/*
We’ll add a single example file to the corpus:
# fuzz/corpus/parse_lower/example
let foo = 10
let bar = 20
let a = foo + (bar * bar - (100 / 33))
a * (a - 10) # the result
This includes every bit of syntax we currently support. Let’s run the fuzzer to see what it finds:
$ cd fuzz
$ cargo fuzz run parse_lower
I let this run for a few minutes and didn’t find anything (I know you’re meant to let fuzzers run for several hours so they can exhaust every possible codepath, but I can’t be bothered), so I decided it was time to add AST validation to the fuzz target.
// fuzz/fuzz_targets/parse_lower.rs
fuzz_target!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
let parse = parser::parse(s);
let syntax = parse.syntax();
let _validation_errors = ast::validation::validate(&syntax);
let root = ast::Root::cast(syntax).unwrap();
let (_database, _stmts) = hir::lower(root);
}
});
parse_lower
isn’t a good name for this fuzz target anymore, as it covers more than just parsing and lowering – let’s rename it to main
, because that seems like a good name for the only fuzz target.
$ mv fuzz/fuzz_targets/{parse_lower.rs,main.rs}
$ mkdir fuzz/corpus/main
$ mv fuzz/corpus/{parse_lower,main}/example
$ rm -r fuzz/corpus/parse_lower
# Cargo.toml
[[bin]]
name = "main"
path = "fuzz_targets/main.rs"
We can now run our fuzz target with cd fuzz && cargo fuzz run main
. Again, this didn’t find anything within the first few minutes for me.
Conclusion
Thank you for reading! In the next part, we’ll implement string literals.
I really should have done this at the start. This is a nice pattern – you use a fuzzer to find bugs in your code, you write some unit tests to catch the bugs the fuzzer found, and then write code to make the bugs go away. ↩︎