Leveraging Rust's Trait Objects for reduced boilerplate

Making a multi-call binary is a neat way of sharing code between a number of small programs. Perhaps the most famous example is Busybox but there are others such as Dropbear (an ssh client and server) or BeastieBox. I once even wrote a multi-call shell script, which sourced a different file full of functions depending on the name it was called as. Today's project is inspired heavily by BusyBox but is written in Rust and has a somewhat more limited scope.

Rather than attempting to fit a couple hundred small utilities into one binary as busybox does, I'm going to split this up into several binaries each of which has anywhere from a half dozen to up to 50 applets. That should keep the overall size of each binary down as well as making it possible to avoid carrying too many dependencies in each binary that aren't going to be useful outside of a few commands. For example, all of the various checksum utilities will be in a single binary called hashbox. Even so, it's best to give some thought early on to how the project will be organized. Adding a new command applet should hopefully only involve writing that applet and then inserting a couple lines into one other file. An unhappy outcome would involve having to edit 3-4 files in multiple places in order to add another command. Additionally, there should be a standard way of calling every command. We can also take some steps to increase consistency in the way the command line arguments work, although there is only so much one can do in that regard without breaking POSIX compatability.

What is a Trait Object?

A trait object refers to a generic interface, where instead of specifying a concrete type, we instead specify that we have an object which implements a specific Trait.

Defining the Cmd trait

We're going to design our Cmd trait so that we will be gauranteed an object which has a method that returns a clap::Command struct, as well as a method which takes input in the form of a &clap::ArgMatches struct and runs the applet. Since it would be nice if there was a way to automatically install all of the various links back to the binary (as well as some nice extras which we'll get to later) we're going to also need a way to tell our special bootstrap applet where the link for this applet should go. Putting it all together we get the following.

// crate::lib.rs
mod cmd;
pub use cmd::Cmd;

/// Defines the location relative to the binary where a command will be installed
#[derive(Debug, Clone, Copy)]
pub enum Path {
    /// /bin
    Bin,
    /// /sbin
    Sbin,
    /// /usr/bin
    UsrBin,
    /// /usr/sbin
    UsrSbin,
}

impl Path {
    #[must_use]
    pub fn to_str(&self, usr: bool) -> &'static str {
        match self {
            Self::UsrBin if usr => "usr/bin",
            Self::UsrSbin if usr => "usr/sbin",
            Self::Bin | Self::UsrBin => "bin",
            Self::Sbin | Self::UsrSbin => "sbin",
        }
    }
}

// crate::cmd::mod.rs
use clap::ArgMatches;
use std::{error::Error, fmt};

/// Defines a command or applet, it's cli interface, and it's installation directory
/// relative to the binary
pub trait Cmd: fmt::Debug + Sync {
    /// Defines the cli of the applet
    fn cli(&self) -> clap::Command;
    /// Runs the applet
    /// # Errors
    /// Bubbles up any errors to the caller
    fn run(&self, matches: &ArgMatches) -> Result<(), Box<dyn Error>>;
    /// Returns the path relative to the binary where the link to this applet
    /// will be installed
    fn path(&self) -> Option<crate::Path>;
}

That's our basic scaffolding. With this in place we can do some neat things, such as create a Vec or have a function which returns a Box. The latter is what we're going to use to run an applet based on the name the program was called by.

Downsides? Well there is one. You pay a small performance penalty because the actual function which will be run has to be looked up from a vtable at runtime. But we're talking a small, one time penalty, and we're going to be running some pretty well optimized code everywhere else. The applets should run plenty fast.

Anyway, here's what our lookup code looks like.

// crate::hashbox::commands::mod.rs
pub fn get(name: &str) -> Option<Box<dyn Cmd>> {
    match name {
        "b2sum" => Some(Box::new(b2sum::B2sum::default())),
        "hashbox" => Some(Box::new(hashbox::Hashbox::default())),
        "md5sum" => Some(Box::new(md5sum::Md5sum::default())),
        "sha1sum" => Some(Box::new(sha1sum::Sha1sum::default())),
        "sha224sum" => Some(Box::new(sha224sum::Sha224sum::default())),
        "sha256sum" => Some(Box::new(sha256sum::Sha256sum::default())),
        "sha384sum" => Some(Box::new(sha384sum::Sha384sum::default())),
        "sha512sum" => Some(Box::new(sha512sum::Sha512sum::default())),
        _ => None,
    }
}

If it weren't for wanting to have our bootstrap applet that would be the only place that the code needs modified to add an applet, apart from creating a new module containing the struct which implements Cmd. But we want to be able to iterate over our commands, so in the same file we create a static array of applet names.

// crate::hashbox::commands::mod.rs
pub static COMMANDS: [&str; 7] = [
    "b2sum",
    "md5sum",
    "sha1sum",
    "sha224sum",
    "sha256sum",
    "sha384sum",
    "sha512sum",
];

Adding an applet

I'm going to pick the sha224sum applet for the example, as it's not completely trivial like, say echo, but not overrly complex either. It's also a good example of an applet that manages to reuse a lot of code with other applets.

Impl Cmd - the cli() method

// crate::hashbox::commands::sha224sum.rs

#[derive(Debug, Default)]
pub struct Sha224sum;

impl Cmd for Sha224sum {
    fn cli(&self) -> clap::Command {
        Command::new("sha224sum")
            .about("compute and check SHA1 message digest")
            .author("Nathan Fisher")
            .version(env!("CARGO_PKG_VERSION"))
            .args([args::check(), args::file()])
    }

The actual object which is impl Cmd here is a unit struct. Notice that we take &self as a parameter, even though we're not passing any data from self in. We actually have to do this in order to use Sha224sum as a trait object. I'll also point out that the list of clap Arg's is populated via functions which appear in the args module. This makes it easy to maintain consistency between applets because things like help messages will always display the same strings where appropriate, while also making the code less verbose.

Impl Cmd - the run() method

// crate::hashbox::commands::sha224sum.rs
    fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn std::error::Error>> {
        if let Some(files) = matches.get_many::<String>("file") {
            let mut erred = 0;
            for f in files {
                if matches.get_flag("check") {
                    if f == "-" {
                        return Err(
                            io::Error::new(io::ErrorKind::Other, "no file specified").into()
                        );
                    }
                    hash::check_sums(f, HashType::Sha224, &mut erred)?;
                } else {
                    let hasher = Sha224::new();
                    let s = hash::compute_hash(f, hasher)?;
                    println!("{s}  {f}");
                }
            }
            if erred > 0 {
                let msg = format!("WARNING: {erred} computed checksums did NOT match");
                return Err(
                    io::Error::new(io::ErrorKind::Other, msg).into()
                );
            }
        }
        Ok(())
    }

I didn't want to get bogged down in creating tons of custom error types for this project, so run always returns Result<(), Box<dyn Error>>, making it easy to short circuit on error. We can then liberally use io::ErrorKind::Other to quickly make custom errors using strings where we need to. I'm using that strategy twice in this function actually.

Now since most of the hash functions are coming from crates provided by the rust_crypto GitHub organization they all have a remarkably similar interface, making it possible to abstract over it very lightly in the hash module, where I've placed a couple of nice generic functions that do the lifting and keep each applet's run function pretty nice and short.

hash::compute_hash()

// crate::hashbox::hash::mod.rs
pub fn compute_hash<T>(file: &str, mut hasher: T) -> Result<String, Box<dyn Error>>
where
    T: Default + FixedOutput + HashMarker,
{
    let mut buf = vec![];
    if file == "-" {
        let _s = io::stdin().read_to_end(&mut buf)?;
    } else {
        let mut fd = File::open(file)?;
        let _s = fd.read_to_end(&mut buf)?;
    }
    let mut s = String::new();
    hasher.update(&buf);
    let res = hasher.finalize();
    for c in res {
        write!(s, "{c:02x}")?;
    }
    Ok(s)
}

We need to constrain the hasher argument to a type which implements the required traits, hence the where clause. This is defined by the interface that the third party crates provide. Looking at the first five lines of the function, we're filling a buffer of u8 either from stdin or from a file. We then pass that buffer into our hasher and finalize it, finally finishing by writing each number out as a two digit hexadecimal string, padded on the left with a zero if needed.

hash::check_sums()

// crate::hashbox::hash::mod.rs
pub enum HashType {
    Md5,
    Sha1,
    Sha224,
    Sha256,
    Sha384,
    Sha512,
}

pub fn check_sums(file: &str, hashtype: HashType, erred: &mut usize) -> Result<(), Box<dyn Error>> {
    let fd = File::open(file)?;
    let reader = BufReader::new(fd);
    for line in reader.lines() {
        let line = line?;
        let mut split = line.split_whitespace();
        let sum = split.next().ok_or::<io::Error>(
            io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(),
        )?;
        let file = split.next().ok_or::<io::Error>(
            io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(),
        )?;
        let s = match hashtype {
            HashType::Md5 => compute_hash(file, Md5::new())?,
            HashType::Sha1 => compute_hash(file, Sha1::new())?,
            HashType::Sha224 => compute_hash(file, Sha224::new())?,
            HashType::Sha256 => compute_hash(file, Sha256::new())?,
            HashType::Sha384 => compute_hash(file, Sha384::new())?,
            HashType::Sha512 => compute_hash(file, Sha512::new())?,
        };
        if s.as_str() == sum {
            println!("{file}: OK");
        } else {
            println!("{file}: FAILED");
            *erred += 1;
        }
    }
    Ok(())
}

This function is checking a previously made listing of sha224 sums from a file against some files that we might have just downloaded. We want to keep track of any errors but continue on, and we might be using more than one listing of checksums, so we're going to pass that number in as erred: &mut usize, and we'll tell the function which hasher to use with our enum parameter. We can still short circuit here if the checksum file is formatted incorrectly, or if we encounter a problem while computing a sum. But if we compute a sum and find that it doesn't match, we just increment erred and continue to the next line.

Calling the correct applet from main()

// crate::hashbox::main.rs
fn main() {
    if let Some(progname) = shitbox::progname() {
        if let Some(command) = commands::get(&progname) {
            let cli = command.cli();
            if let Err(e) = command.run(&cli.get_matches()) {
                eprintln!("{progname}: Error: {e}");
                process::exit(1);
            }
        } else {
            eprintln!("hashbox: Error: unknown command {progname}");
            process::exit(1);
        }
    }
}

Everything should be pretty understandable here. Any errors just get bubbled up, and when we print the error it also prints the name of the applet. The progname function is equivalent to the psuedo-code basename(argv[0]).

NOTE:

We can call an applet either by calling the link to hashbox, which is named sha224sum, or by calling hashbox sha224sum, which I'll explain next.

impl Cmd for Hashbox

// crate::hashbox::commands::hashbox.rs
pub struct Hashbox;

impl Cmd for Hashbox {
    fn cli(&self) -> clap::Command {
        let subcommands: Vec<Command> = {
            let mut s = vec![];
            for c in COMMANDS {
                if c == "hashbox" {
                    continue;
                }
                if let Some(cmd) = crate::commands::get(c) {
                    s.push(cmd.cli());
                }
            }
            s
        };
        Command::new("hashbox")
            .about("The box store multitool of embedded Linux")
            .version(env!("CARGO_PKG_VERSION"))
            .propagate_version(true)
            .arg_required_else_help(true)
            .subcommand_value_name("APPLET")
            .subcommand_help_heading("APPLETS")
            .subcommands(&subcommands)
    }

    fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn Error>> {
        if let Some((name, matches)) = matches.subcommand() {
            if let Some(command) = crate::commands::get(name) {
                if let Err(e) = command.run(matches) {
                    eprintln!("Error: {name}: {e}");
                    process::exit(1);
                }
            }
        }
        Ok(())
    }

When we call the program as hashbox, then it just runs the special applet hashbox, which treats each of the other applets as a subcommand. In the cli method shown, we create the clap::Command structs for each applet in a loop, adding them to a Vec which can then be inserted in the appropriate spot.

The run method for the hashbox applet is quite similar to main. Instead of using args[0] this time we're using a clap subcommand and passing that string to the get command shown way up near the top of the post when we were setting this all up. Note that we can only get away with if let Some(command) = crate::commands::get(name) due to the fact that it is a trait object being returned. We also once again use the applet name along with any error messages being bubbled up, so we can afford to be pretty lazy about how we do our error handling in each individual applet and focus more on correct function. The exception would be in the cases where we want our return values to convey how the program failed, in which case thos things must be coded into each applet. Because POSIX, you know?

Not shown is the bootstrap applet, which is just about the hairiest code in the project so far. It's actually currently broken as of this writing but will be fairly straightforward to fix, as the breakage occurred by moving applets around and out of the original single binary and it just needs to be made to link the applets to the correct binaries. But I digress. Clap is really quite awesome in the information it exposes, and bootstrap leverages a few third party crates to provide not only the applet links but also shell completions and Unix man pages. So when that part of the codebase is fixed you'll be able to drop a binary into the filesystem and install everything else with a single command.

The code for this project is at codeberg under the working title of shitbox, because naming things is hard.

2023-02-07

Jean G3nie