Leveraging Rust's Trait Objects for reduced boilerplate
Making a multi-call binary is a neat way of sharing code between a number of small programs. Perhaps the most famous example is Busybox but there are others such as Dropbear (an ssh client and server) or BeastieBox. I once even wrote a multi-call shell script, which sourced a different file full of functions depending on the name it was called as. Today's project is inspired heavily by BusyBox but is written in Rust and has a somewhat more limited scope.
Rather than attempting to fit a couple hundred small utilities into one binary as
busybox does, I'm going to split this up into several binaries each of which has
anywhere from a half dozen to up to 50 applets. That should keep the overall size
of each binary down as well as making it possible to avoid carrying too many
dependencies in each binary that aren't going to be useful outside of a few
commands. For example, all of the various checksum utilities will be in a single
binary called hashbox
. Even so, it's best to give some thought early on to how
the project will be organized. Adding a new command applet should hopefully only
involve writing that applet and then inserting a couple lines into one other file.
An unhappy outcome would involve having to edit 3-4 files in multiple places in
order to add another command. Additionally, there should be a standard way of
calling every command. We can also take some steps to increase consistency in the
way the command line arguments work, although there is only so much one can do
in that regard without breaking POSIX compatability.
What is a Trait Object?
A trait object refers to a generic interface, where instead of specifying a concrete type, we instead specify that we have an object which implements a specific Trait.
Defining the Cmd trait
We're going to design our Cmd
trait so that we will be gauranteed an object
which has a method that returns a clap::Command
struct, as well as a method which
takes input in the form of a &clap::ArgMatches
struct and runs the applet. Since
it would be nice if there was a way to automatically install all of the various
links back to the binary (as well as some nice extras which we'll get to later)
we're going to also need a way to tell our special bootstrap
applet where the
link for this applet should go. Putting it all together we get the following.
// crate::lib.rs
mod cmd;
pub use cmd::Cmd;
/// Defines the location relative to the binary where a command will be installed
#[derive(Debug, Clone, Copy)]
pub enum Path {
/// /bin
Bin,
/// /sbin
Sbin,
/// /usr/bin
UsrBin,
/// /usr/sbin
UsrSbin,
}
impl Path {
#[must_use]
pub fn to_str(&self, usr: bool) -> &'static str {
match self {
Self::UsrBin if usr => "usr/bin",
Self::UsrSbin if usr => "usr/sbin",
Self::Bin | Self::UsrBin => "bin",
Self::Sbin | Self::UsrSbin => "sbin",
}
}
}
// crate::cmd::mod.rs
use clap::ArgMatches;
use std::{error::Error, fmt};
/// Defines a command or applet, it's cli interface, and it's installation directory
/// relative to the binary
pub trait Cmd: fmt::Debug + Sync {
/// Defines the cli of the applet
fn cli(&self) -> clap::Command;
/// Runs the applet
/// # Errors
/// Bubbles up any errors to the caller
fn run(&self, matches: &ArgMatches) -> Result<(), Box<dyn Error>>;
/// Returns the path relative to the binary where the link to this applet
/// will be installed
fn path(&self) -> Option<crate::Path>;
}
That's our basic scaffolding. With this in place we can do some neat things, such
as create a Vec
Downsides? Well there is one. You pay a small performance penalty because the actual function which will be run has to be looked up from a vtable at runtime. But we're talking a small, one time penalty, and we're going to be running some pretty well optimized code everywhere else. The applets should run plenty fast.
Anyway, here's what our lookup code looks like.
// crate::hashbox::commands::mod.rs
pub fn get(name: &str) -> Option<Box<dyn Cmd>> {
match name {
"b2sum" => Some(Box::new(b2sum::B2sum::default())),
"hashbox" => Some(Box::new(hashbox::Hashbox::default())),
"md5sum" => Some(Box::new(md5sum::Md5sum::default())),
"sha1sum" => Some(Box::new(sha1sum::Sha1sum::default())),
"sha224sum" => Some(Box::new(sha224sum::Sha224sum::default())),
"sha256sum" => Some(Box::new(sha256sum::Sha256sum::default())),
"sha384sum" => Some(Box::new(sha384sum::Sha384sum::default())),
"sha512sum" => Some(Box::new(sha512sum::Sha512sum::default())),
_ => None,
}
}
If it weren't for wanting to have our bootstrap
applet that would be the only
place that the code needs modified to add an applet, apart from creating a new
module containing the struct which implements Cmd
. But we want to be able to
iterate over our commands, so in the same file we create a static array of applet
names.
// crate::hashbox::commands::mod.rs
pub static COMMANDS: [&str; 7] = [
"b2sum",
"md5sum",
"sha1sum",
"sha224sum",
"sha256sum",
"sha384sum",
"sha512sum",
];
Adding an applet
I'm going to pick the sha224sum
applet for the example, as it's not completely
trivial like, say echo
, but not overrly complex either. It's also a good example
of an applet that manages to reuse a lot of code with other applets.
Impl Cmd - the cli() method
// crate::hashbox::commands::sha224sum.rs
#[derive(Debug, Default)]
pub struct Sha224sum;
impl Cmd for Sha224sum {
fn cli(&self) -> clap::Command {
Command::new("sha224sum")
.about("compute and check SHA1 message digest")
.author("Nathan Fisher")
.version(env!("CARGO_PKG_VERSION"))
.args([args::check(), args::file()])
}
The actual object which is impl Cmd
here is a unit struct. Notice that we take
&self
as a parameter, even though we're not passing any data from self
in. We
actually have to do this in order to use Sha224sum
as a trait object. I'll also
point out that the list of clap Arg's is populated via functions which appear in
the args
module. This makes it easy to maintain consistency between applets
because things like help messages will always display the same strings where appropriate,
while also making the code less verbose.
Impl Cmd - the run() method
// crate::hashbox::commands::sha224sum.rs
fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn std::error::Error>> {
if let Some(files) = matches.get_many::<String>("file") {
let mut erred = 0;
for f in files {
if matches.get_flag("check") {
if f == "-" {
return Err(
io::Error::new(io::ErrorKind::Other, "no file specified").into()
);
}
hash::check_sums(f, HashType::Sha224, &mut erred)?;
} else {
let hasher = Sha224::new();
let s = hash::compute_hash(f, hasher)?;
println!("{s} {f}");
}
}
if erred > 0 {
let msg = format!("WARNING: {erred} computed checksums did NOT match");
return Err(
io::Error::new(io::ErrorKind::Other, msg).into()
);
}
}
Ok(())
}
I didn't want to get bogged down in creating tons of custom error types for this
project, so run
always returns Result<(), Box<dyn Error>>
, making it easy to
short circuit on error. We can then liberally use io::ErrorKind::Other
to quickly
make custom errors using strings where we need to. I'm using that strategy twice in
this function actually.
Now since most of the hash functions are coming from crates provided by the rust_crypto
GitHub organization they all have a remarkably similar interface, making it possible
to abstract over it very lightly in the hash
module, where I've placed a couple
of nice generic functions that do the lifting and keep each applet's run
function
pretty nice and short.
hash::compute_hash()
// crate::hashbox::hash::mod.rs
pub fn compute_hash<T>(file: &str, mut hasher: T) -> Result<String, Box<dyn Error>>
where
T: Default + FixedOutput + HashMarker,
{
let mut buf = vec![];
if file == "-" {
let _s = io::stdin().read_to_end(&mut buf)?;
} else {
let mut fd = File::open(file)?;
let _s = fd.read_to_end(&mut buf)?;
}
let mut s = String::new();
hasher.update(&buf);
let res = hasher.finalize();
for c in res {
write!(s, "{c:02x}")?;
}
Ok(s)
}
We need to constrain the hasher
argument to a type which implements the required
traits, hence the where clause. This is defined by the interface that the third
party crates provide. Looking at the first five lines of the function, we're filling
a buffer of u8
either from stdin or from a file. We then pass that buffer into
our hasher and finalize it, finally finishing by writing each number out as a two
digit hexadecimal string, padded on the left with a zero if needed.
hash::check_sums()
// crate::hashbox::hash::mod.rs
pub enum HashType {
Md5,
Sha1,
Sha224,
Sha256,
Sha384,
Sha512,
}
pub fn check_sums(file: &str, hashtype: HashType, erred: &mut usize) -> Result<(), Box<dyn Error>> {
let fd = File::open(file)?;
let reader = BufReader::new(fd);
for line in reader.lines() {
let line = line?;
let mut split = line.split_whitespace();
let sum = split.next().ok_or::<io::Error>(
io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(),
)?;
let file = split.next().ok_or::<io::Error>(
io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(),
)?;
let s = match hashtype {
HashType::Md5 => compute_hash(file, Md5::new())?,
HashType::Sha1 => compute_hash(file, Sha1::new())?,
HashType::Sha224 => compute_hash(file, Sha224::new())?,
HashType::Sha256 => compute_hash(file, Sha256::new())?,
HashType::Sha384 => compute_hash(file, Sha384::new())?,
HashType::Sha512 => compute_hash(file, Sha512::new())?,
};
if s.as_str() == sum {
println!("{file}: OK");
} else {
println!("{file}: FAILED");
*erred += 1;
}
}
Ok(())
}
This function is checking a previously made listing of sha224 sums from a file
against some files that we might have just downloaded. We want to keep track of
any errors but continue on, and we might be using more than one listing of
checksums, so we're going to pass that number in as erred: &mut usize
, and we'll
tell the function which hasher to use with our enum parameter. We can still short
circuit here if the checksum file is formatted incorrectly, or if we encounter a
problem while computing a sum. But if we compute a sum and find that it doesn't
match, we just increment erred
and continue to the next line.
Calling the correct applet from main()
// crate::hashbox::main.rs
fn main() {
if let Some(progname) = shitbox::progname() {
if let Some(command) = commands::get(&progname) {
let cli = command.cli();
if let Err(e) = command.run(&cli.get_matches()) {
eprintln!("{progname}: Error: {e}");
process::exit(1);
}
} else {
eprintln!("hashbox: Error: unknown command {progname}");
process::exit(1);
}
}
}
Everything should be pretty understandable here. Any errors just get bubbled up,
and when we print the error it also prints the name of the applet. The progname
function is equivalent to the psuedo-code basename(argv[0])
.
NOTE:
We can call an applet either by calling the link to
hashbox
, which is namedsha224sum
, or by callinghashbox sha224sum
, which I'll explain next.
impl Cmd for Hashbox
// crate::hashbox::commands::hashbox.rs
pub struct Hashbox;
impl Cmd for Hashbox {
fn cli(&self) -> clap::Command {
let subcommands: Vec<Command> = {
let mut s = vec![];
for c in COMMANDS {
if c == "hashbox" {
continue;
}
if let Some(cmd) = crate::commands::get(c) {
s.push(cmd.cli());
}
}
s
};
Command::new("hashbox")
.about("The box store multitool of embedded Linux")
.version(env!("CARGO_PKG_VERSION"))
.propagate_version(true)
.arg_required_else_help(true)
.subcommand_value_name("APPLET")
.subcommand_help_heading("APPLETS")
.subcommands(&subcommands)
}
fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn Error>> {
if let Some((name, matches)) = matches.subcommand() {
if let Some(command) = crate::commands::get(name) {
if let Err(e) = command.run(matches) {
eprintln!("Error: {name}: {e}");
process::exit(1);
}
}
}
Ok(())
}
When we call the program as hashbox
, then it just runs the special applet hashbox
,
which treats each of the other applets as a subcommand. In the cli
method shown,
we create the clap::Command
structs for each applet in a loop, adding them to a
Vec
The run
method for the hashbox
applet is quite similar to main
. Instead of
using args[0]
this time we're using a clap subcommand and passing that string to
the get
command shown way up near the top of the post when we were setting this
all up. Note that we can only get away with if let Some(command) = crate::commands::get(name)
due to the fact that it is a trait object being returned. We also once again use
the applet name along with any error messages being bubbled up, so we can afford
to be pretty lazy about how we do our error handling in each individual applet and
focus more on correct function. The exception would be in the cases where we want
our return values to convey how the program failed, in which case thos things must
be coded into each applet. Because POSIX, you know?
Not shown is the bootstrap
applet, which is just about the hairiest code in the
project so far. It's actually currently broken as of this writing but will be
fairly straightforward to fix, as the breakage occurred by moving applets around
and out of the original single binary and it just needs to be made to link the
applets to the correct binaries. But I digress. Clap is really quite awesome in
the information it exposes, and bootstrap
leverages a few third party crates to
provide not only the applet links but also shell completions and Unix man pages.
So when that part of the codebase is fixed you'll be able to drop a binary into
the filesystem and install everything else with a single command.
The code for this project is at codeberg under the working title of shitbox, because naming things is hard.
2023-02-07