Syntax Highlight for SQL in Diagnostic errors

Table of Contents

Tags:

This is the third article of my quest for improving the developer experience around using sql:

  1. Making SQL Keyword Suggestions Work
  2. Embedding Lua in sqleibniz with Rust

TLDR

I added syntax highlight to sqleibniz diagnostic output:

sqleibniz diagnostic with syntax highlight

This time I was annoyed by tools like rustc for not applying syntax highlighting to diagnostics:

rustc compilation error without syntax highlight

sqleibniz is also missing this feature:

sqleibniz diagnostic without syntax highlight

The Idea

  1. Pass a line of sql text to a highlight function
  2. Have it generate color escape codes for lexemes
  3. Write color escape codes and lexemes to a string builder / buffer
  4. Dump builder content to stdout

A String Builder

I want to write the resulting string of the diagnostic display to more buffers than stdout. This means i need a temporary buffer i can write String, &str, char, byte and Vec<u8> to. My string builder is just a struct holding a vector of bytes. The source can be found here.

RUST
 1pub struct Builder {
 2    buffer: Vec<u8>,
 3}
 4
 5impl Builder {
 6    // [...]
 7    
 8    pub fn string(self) -> String {
 9        match String::from_utf8(self.buffer) {
10            Ok(string) => string,
11            Err(_) => String::from("<failed to stringify Builder::buffer"),
12        }
13    }
14}

Creating the string representation of the byte vector consumes the whole builder. The other methods for appending bytes:

RUST
 1impl Builder {
 2    pub fn new() -> Self {
 3        Builder { buffer: Vec::new() }
 4    }
 5
 6    pub fn with_capacity(cap: usize) -> Self {
 7        Builder {
 8            buffer: Vec::with_capacity(cap),
 9        }
10    }
11
12    pub fn write_char(&mut self, char: char) {
13        self.buffer.push(char as u8);
14    }
15
16    pub fn write_byte(&mut self, byte: u8) {
17        self.buffer.push(byte);
18    }
19
20    pub fn write_str(&mut self, str: &str) {
21        self.buffer.append(&mut str.as_bytes().to_vec());
22    }
23
24    pub fn write_string(&mut self, string: String) {
25        self.buffer.append(&mut string.into_bytes())
26    }
27
28    pub fn write_buf(&mut self, buf: Vec<u8>) {
29        let mut b = buf;
30        self.buffer.append(&mut b)
31    }
32
33    // [...]
34}

ANSI Color Escape Codes, or: I don’t care about Windows

Wikipedia has a nice list of ANSI color escape codes: ANSI escape code#Colors.

I do know these can be wonky on the windows cmd, but i simply dont care, i do not own a device with windows on it and i do not develop for windows or with windows in mind.

So I created an enum and just added a &str representation:

RUST
 1#[derive(Debug)]
 2pub enum Color {
 3    Reset,
 4
 5    // used for error display
 6    Red,
 7    Blue,
 8    Cyan,
 9    Green,
10    Yellow,
11
12    // used for syntax highlighting
13    Grey,
14    Magenta,
15    Orange,
16    White,
17}
18
19impl Color {
20    pub fn as_str(&self) -> &str {
21        match self {
22            Self::Reset => "\x1b[0m",
23            Self::Red => "\x1b[31m",
24            Self::Blue => "\x1b[94m",
25            Self::Green => "\x1b[92m",
26            Self::Yellow => "\x1b[93m",
27            Self::Cyan => "\x1b[96m",
28            Self::Grey => "\x1b[90m",
29            Self::Magenta => "\x1b[35m",
30            Self::Orange => "\x1b[33m",
31            Self::White => "\x1b[97m",
32        }
33    }
34}

Mapping Tokens to Colors

Since I only need this mapping in the highlight module, i created a private Highlight trait:

RUST
 1trait Highlight {
 2    fn lookup(ttype: &Type) -> Color;
 3    fn as_bytes(&self) -> Vec<u8>;
 4}
 5
 6impl Highlight for Color {
 7    fn lookup(ttype: &Type) -> Color {
 8        match ttype {
 9            Type::Keyword(_) => Self::Magenta,
10            // atoms
11            Type::String(_) 
12            | Type::Number(_) 
13            | Type::Blob(_) 
14            | Type::Boolean(_) => Self::Orange,
15            // special symbols
16            Type::Dollar
17            | Type::Colon
18            | Type::Asterisk
19            | Type::Question
20            | Type::Param(_)
21            | Type::Percent
22            | Type::ParamName(_) => Self::Red,
23            // symbols
24            Type::Dot
25            | Type::Ident(_)
26            | Type::Semicolon
27            | Type::Comma
28            | Type::Equal
29            | Type::At
30            | Type::BraceLeft
31            | Type::BraceRight
32            | Type::BracketLeft
33            | Type::BracketRight => Self::White,
34            _ => Self::Grey,
35        }
36    }
37
38    fn as_bytes(&self) -> Vec<u8> {
39        self.as_str().as_bytes().to_vec()
40    }
41}

Highlighting module

As introduced before, the highlight module contains the Highlight trait, the string builder and the logic for highlighting a singular line of sql input, in the form of the highlight function:

RUST
 1pub fn highlight(builder: &mut builder::Builder, token_on_line: &[&Token], line: &str) {
 2    // no tokens on a line means: either comment or empty line
 3    if token_on_line.len() == 0 {
 4        builder.write_str(Color::Grey.as_str());
 5        builder.write_str(line);
 6        builder.write_str(Color::Reset.as_str());
 7        return;
 8    }
 9
10    let reset = Color::Reset.as_bytes();
11
12    let mut buf = line
13        .split("")
14        .map(|s| s.as_bytes().to_vec())
15        .skip(1)
16        .take(line.len())
17        .collect::<Vec<Vec<u8>>>();
18
19    let original_length = buf.len();
20    for tok in token_on_line {
21        let offset = buf.len() - original_length;
22        let color = Color::lookup(&tok.ttype);
23        buf.insert(tok.start + offset, color.as_bytes());
24        if tok.start == tok.end {
25            buf.insert(tok.end + offset, reset.clone());
26        } else {
27            buf.insert(tok.end + offset + 1, reset.clone());
28        }
29    }
30
31    // INFO: used to inspect the text
32    // dbg!(&buf
33    //     .iter()
34    //     .map(|s| String::from_utf8(s.to_vec()).unwrap())
35    //     .collect::<Vec<String>>());
36
37    for element in buf {
38        builder.write_buf(element.to_vec());
39    }
40}

The basic idea behind the highlighting is to split the input string into a list of characters as strings (specifically strings, because I want to insert color escape codes before the start of a lexeme and at its end). Consider the following lines:

SQL
1-- causes a diagnostic,
2-- because VACUUM does not allow a literal at this point
3VACUUM ' ';

The first line will be filtered out by the first condition in the function - because the lexer does not output any tokens for that line. Thus, we focus on VACUUM ' ';. First we fill the buf variable:

RUST
 1let mut buf = line 
 2    // "VACUUM ' ';"
 3    .split("")
 4    // vec!["", "V", "A", "C", "U", "U", "M", "'", " ", "'", ";", ""]
 5    .skip(1)
 6    // we skip the first empty string
 7    .take(line.len())
 8    // we skip the last empty string
 9    .map(|s| s.as_bytes().to_vec())
10    // same as before, but as vector of bytes
11    .collect::<Vec<Vec<u8>>>();

Now we have the input split into characters and are able to insert the correct color code and escape codes, as shown below:

  1. Type::Keyword(VACUUM) is a keyword, thus we use the Highlight::lookup method to get Color::Magenta
  2. Type::String is an atom, and resolves to Color::Orange
  3. Type::Semicolon is a symbol, thus we use Color::White
  4. after each lexeme, we must insert the Color::Reset enum variant to correctly highlight all the following text
SQL
 1   VACUUM ' ';
 2-- ^    ^ ^ ^^
 3-- |    | | ||
 4-- |    | | |+-- before this we insert Color::White and after Color::Reset
 5-- |    | | |
 6-- |    | | +-- after this, we insert Color::Reset
 7-- |    | | 
 8-- |    | +-- before this, we insert Color::Orange
 9-- |    |
10-- |    +-- after this point, we insert Color::Reset
11-- |
12-- +-- before this point, we need to insert Color::Making

Since we are inserting into the buf variable, we need to keep track of its original_length to compute the offset for all future insertions. We then iterate over the tokens the caller passed into the function (token_on_line, which should only contain tokens that were on the line we want to highlight):

RUST
 1let original_length = buf.len();
 2for tok in token_on_line {
 3    let offset = buf.len() - original_length;
 4    let color = Color::lookup(&tok.ttype);
 5    buf.insert(tok.start + offset, color.as_bytes());
 6    if tok.start == tok.end {
 7        buf.insert(tok.end + offset, reset.clone());
 8    } else {
 9        buf.insert(tok.end + offset + 1, reset.clone());
10    }
11}

Once we computed the offset, we use Highlight::lookup to get the color of the current token and write the color escape code we got via Highlight::as_bytes at the start of the token plus the offset into the buffer. However, if the token is not one character long, we have to add move Color::reset one position to the right.

After this, all elements of buf are written into the string builder passed into the function.

Dont mind the performance, it works and it is fast enough, I know inserting into an vector is slow and there are faster solutions.

Attaching to diagnostic display

final syntax highlighting

As shown above and below, the highlight function is called with the shared string builder, all tokens found on the current line and the offending line itself:

RUST
 1
 2#[derive(Debug, Clone, PartialEq)]
 3pub struct Error {
 4    pub file: String,
 5    pub line: usize,
 6    pub rule: Rule,
 7    pub note: String,
 8    pub msg: String,
 9    pub start: usize,
10    pub end: usize,
11    pub doc_url: Option<&'static str>,
12}
13
14pub fn print_str_colored(b: &mut builder::Builder, s: &str, c: Color) {
15    b.write_str(c.as_str());
16    b.write_str(s);
17    b.write_str(Color::Reset.as_str());
18}
19
20impl Error {
21    pub fn print(&mut self, b: &mut builder::Builder, content: &Vec<u8>, tokens: &[Token]) {
22        // [...]
23        let offending_line = String::from(lines.get(self.line).unwrap());
24        print_str_colored(b, &format!(" {:02} | ", self.line + 1), Color::Blue);
25        highlight(
26            b,
27            &tokens
28                .iter()
29                .filter(|t| t.line == self.line)
30                .collect::<Vec<&Token>>(),
31            &offending_line,
32        );
33        print_str_colored(b, "\n    |", Color::Blue);
34        // [...]
35    }
36}

The Error::print function itself is called from the main function of the sqleibniz crate:

RUST
 1    // [...]
 2
 3    if !processed_errors.is_empty() && !args.silent {
 4        error::print_str_colored(
 5            &mut error_string_builder,
 6            &format!("{:=^72}\n", format!(" {} ", file.name)),
 7            error::Color::Blue,
 8        );
 9        let error_count = processed_errors.len();
10        for (i, e) in processed_errors.iter().enumerate() {
11            (**e)
12                .clone()
13                .print(&mut error_string_builder, &content, &toks);
14
15            if i + 1 != error_count {
16                error_string_builder.write_char('\n');
17            }
18        }
19    }
20
21    // [...]

Please just ignore the ugly double dereference in line 11, this is needed because I want to further use the processed_errors array and not clone it - simply because there is a lot of data in each error structure.