Leveraging Trait Objects
Making a multi-call binary is a neat way of sharing code between a number of small programs. Perhaps the most famous example is Busybox but there are others such as Dropbear (an ssh client and server) or BeastieBox. I once even wrote a multi-call shell script, which sourced a different file full of functions depending on the name it was called as. Today's project is inspired heavily by BusyBox but is written in Rust and has a somewhat more limited scope.
Rather than attempting to fit a couple hundred small utilities into one
binary as busybox does, I'm going to split this up into several
binaries each of which has anywhere from a half dozen to up to 50
applets. That should keep the overall size of each binary down as well
as making it possible to avoid carrying too many dependencies in each
binary that aren't going to be useful outside of a few commands. For
example, all of the various checksum utilities will be in a single
binary called hashbox. Even so, it's best to give some
thought early on to how the project will be organized. Adding a new
command applet should hopefully only involve writing that applet and
then inserting a couple lines into one other file. An unhappy outcome
would involve having to edit 3-4 files in multiple places in order to
add another command. Additionally, there should be a standard way of
calling every command. We can also take some steps to increase
consistency in the way the command line arguments work, although there
is only so much one can do in that regard without breaking POSIX
compatability.
What is a Trait Object?
A trait object refers to a generic interface, where instead of specifying a concrete type, we instead specify that we have an object which implements a specific Trait.
Defining the Cmd trait
We're going to design our Cmd trait so that we will be
gauranteed an object which has a method that returns a
clap::Command struct, as well as a method which takes
input in the form of a &clap::ArgMatches struct and
runs the applet. Since it would be nice if there was a way to
automatically install all of the various links back to the binary (as
well as some nice extras which we'll get to later) we're going to also
need a way to tell our special bootstrap applet where the
link for this applet should go. Putting it all together we get the
following.
// crate::lib.rs mod cmd; pub use cmd::Cmd; /// Defines the location relative to the binary where a command will be installed #[derive(Debug, Clone, Copy)] pub enum Path { /// /bin Bin, /// /sbin Sbin, /// /usr/bin UsrBin, /// /usr/sbin UsrSbin, } impl Path { #[must_use] pub fn to_str(&self, usr: bool) -> &'static str { match self { Self::UsrBin if usr => "usr/bin", Self::UsrSbin if usr => "usr/sbin", Self::Bin | Self::UsrBin => "bin", Self::Sbin | Self::UsrSbin => "sbin", } } } // crate::cmd::mod.rs use clap::ArgMatches; use std::{error::Error, fmt}; /// Defines a command or applet, it's cli interface, and it's installation directory /// relative to the binary pub trait Cmd: fmt::Debug + Sync { /// Defines the cli of the applet fn cli(&self) -> clap::Command; /// Runs the applet /// # Errors /// Bubbles up any errors to the caller fn run(&self, matches: &ArgMatches) -> Result<(), Box<dyn Error>>; /// Returns the path relative to the binary where the link to this applet /// will be installed fn path(&self) -> Option<crate::Path>; }
That's our basic scaffolding. With this in place we can do some neat
things, such as create a Vec<dyn Cmd> or have a
function which returns a Box<dyn Cmd>. The latter is
what we're going to use to run an applet based on the name the program
was called by.
Downsides? Well there is one. You pay a small performance penalty because the actual function which will be run has to be looked up from a vtable at runtime. But we're talking a small, one time penalty, and we're going to be running some pretty well optimized code everywhere else. The applets should run plenty fast.
Anyway, here's what our lookup code looks like.
// crate::hashbox::commands::mod.rs pub fn get(name: &str) -> Option<Box<dyn Cmd>> { match name { "b2sum" => Some(Box::new(b2sum::B2sum::default())), "hashbox" => Some(Box::new(hashbox::Hashbox::default())), "md5sum" => Some(Box::new(md5sum::Md5sum::default())), "sha1sum" => Some(Box::new(sha1sum::Sha1sum::default())), "sha224sum" => Some(Box::new(sha224sum::Sha224sum::default())), "sha256sum" => Some(Box::new(sha256sum::Sha256sum::default())), "sha384sum" => Some(Box::new(sha384sum::Sha384sum::default())), "sha512sum" => Some(Box::new(sha512sum::Sha512sum::default())), _ => None, } }
If it weren't for wanting to have our bootstrap applet
that would be the only place that the code needs modified to add an
applet, apart from creating a new module containing the struct which
implements Cmd. But we want to be able to iterate over our
commands, so in the same file we create a static array of applet names.
// crate::hashbox::commands::mod.rs pub static COMMANDS: [&str; 7] = [ "b2sum", "md5sum", "sha1sum", "sha224sum", "sha256sum", "sha384sum", "sha512sum", ];
Adding an applet
I'm going to pick the sha224sum applet for the example, as
it's not completely trivial like, say echo, but not
overrly complex either. It's also a good example of an applet that
manages to reuse a lot of code with other applets.
Impl Cmd - the cli() method
// crate::hashbox::commands::sha224sum.rs #[derive(Debug, Default)] pub struct Sha224sum; impl Cmd for Sha224sum { fn cli(&self) -> clap::Command { Command::new("sha224sum") .about("compute and check SHA1 message digest") .author("Nathan Fisher") .version(env!("CARGO_PKG_VERSION")) .args([args::check(), args::file()]) }
The actual object which is impl Cmd here is a unit struct.
Notice that we take &self as a parameter, even though
we're not passing any data from self in. We actually have
to do this in order to use Sha224sum as a trait object.
I'll also point out that the list of clap Arg's is populated via
functions which appear in the args module. This makes it
easy to maintain consistency between applets because things like help
messages will always display the same strings where appropriate, while
also making the code less verbose.
Impl Cmd - the run() method
// crate::hashbox::commands::sha224sum.rs fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn std::error::Error>> { if let Some(files) = matches.get_many::<String>("file") { let mut erred = 0; for f in files { if matches.get_flag("check") { if f == "-" { return Err( io::Error::new(io::ErrorKind::Other, "no file specified").into() ); } hash::check_sums(f, HashType::Sha224, &mut erred)?; } else { let hasher = Sha224::new(); let s = hash::compute_hash(f, hasher)?; println!("{s} {f}"); } } if erred > 0 { let msg = format!("WARNING: {erred} computed checksums did NOT match"); return Err( io::Error::new(io::ErrorKind::Other, msg).into() ); } } Ok(()) }
I didn't want to get bogged down in creating tons of custom error types
for this project, so run always returns
Result<(), Box<dyn Error>>, making it easy to
short circuit on error. We can then liberally use
io::ErrorKind::Other to quickly make custom errors using
strings where we need to. I'm using that strategy twice in this
function actually.
Now since most of the hash functions are coming from crates provided by
the rust_crypto GitHub organization they all have a
remarkably similar interface, making it possible to abstract over it
very lightly in the hash module, where I've placed a
couple of nice generic functions that do the lifting and keep each
applet's run function pretty nice and short.
hash::compute_hash()
// crate::hashbox::hash::mod.rs pub fn compute_hash<T>(file: &str, mut hasher: T) -> Result<String, Box<dyn Error>> where T: Default + FixedOutput + HashMarker, { let mut buf = vec![]; if file == "-" { let _s = io::stdin().read_to_end(&mut buf)?; } else { let mut fd = File::open(file)?; let _s = fd.read_to_end(&mut buf)?; } let mut s = String::new(); hasher.update(&buf); let res = hasher.finalize(); for c in res { write!(s, "{c:02x}")?; } Ok(s) }
We need to constrain the hasher argument to a type which
implements the required traits, hence the where clause. This is defined
by the interface that the third party crates provide. Looking at the
first five lines of the function, we're filling a buffer of
u8 either from stdin or from a file. We then pass that
buffer into our hasher and finalize it, finally finishing by writing
each number out as a two digit hexadecimal string, padded on the left
with a zero if needed.
hash::check_sums()
// crate::hashbox::hash::mod.rs pub enum HashType { Md5, Sha1, Sha224, Sha256, Sha384, Sha512, } pub fn check_sums(file: &str, hashtype: HashType, erred: &mut usize) -> Result<(), Box<dyn Error>> { let fd = File::open(file)?; let reader = BufReader::new(fd); for line in reader.lines() { let line = line?; let mut split = line.split_whitespace(); let sum = split.next().ok_or::<io::Error>( io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(), )?; let file = split.next().ok_or::<io::Error>( io::Error::new(io::ErrorKind::Other, "invalid checksum file").into(), )?; let s = match hashtype { HashType::Md5 => compute_hash(file, Md5::new())?, HashType::Sha1 => compute_hash(file, Sha1::new())?, HashType::Sha224 => compute_hash(file, Sha224::new())?, HashType::Sha256 => compute_hash(file, Sha256::new())?, HashType::Sha384 => compute_hash(file, Sha384::new())?, HashType::Sha512 => compute_hash(file, Sha512::new())?, }; if s.as_str() == sum { println!("{file}: OK"); } else { println!("{file}: FAILED"); *erred += 1; } } Ok(()) }
This function is checking a previously made listing of sha224 sums from
a file against some files that we might have just downloaded. We want
to keep track of any errors but continue on, and we might be using more
than one listing of checksums, so we're going to pass that number in as
erred: &mut usize, and we'll tell the function which
hasher to use with our enum parameter. We can still short circuit here
if the checksum file is formatted incorrectly, or if we encounter a
problem while computing a sum. But if we compute a sum and find that it
doesn't match, we just increment erred and continue to the
next line.
Calling the correct applet from main()
// crate::hashbox::main.rs fn main() { if let Some(progname) = shitbox::progname() { if let Some(command) = commands::get(&progname) { let cli = command.cli(); if let Err(e) = command.run(&cli.get_matches()) { eprintln!("{progname}: Error: {e}"); process::exit(1); } } else { eprintln!("hashbox: Error: unknown command {progname}"); process::exit(1); } } }
Everything should be pretty understandable here. Any errors just get
bubbled up, and when we print the error it also prints the name of the
applet. The progname function is equivalent to the
psuedo-code basename(argv[0]).
NOTE:
We can call an applet either by calling the link to
hashbox, which is namedsha224sum, or by callinghashbox sha224sum, which I'll explain next.
impl Cmd for Hashbox
// crate::hashbox::commands::hashbox.rs pub struct Hashbox; impl Cmd for Hashbox { fn cli(&self) -> clap::Command { let subcommands: Vec<Command> = { let mut s = vec![]; for c in COMMANDS { if c == "hashbox" { continue; } if let Some(cmd) = crate::commands::get(c) { s.push(cmd.cli()); } } s }; Command::new("hashbox") .about("The box store multitool of embedded Linux") .version(env!("CARGO_PKG_VERSION")) .propagate_version(true) .arg_required_else_help(true) .subcommand_value_name("APPLET") .subcommand_help_heading("APPLETS") .subcommands(&subcommands) } fn run(&self, matches: &clap::ArgMatches) -> Result<(), Box<dyn Error>> { if let Some((name, matches)) = matches.subcommand() { if let Some(command) = crate::commands::get(name) { if let Err(e) = command.run(matches) { eprintln!("Error: {name}: {e}"); process::exit(1); } } } Ok(()) }
When we call the program as hashbox, then it just runs the
special applet hashbox, which treats each of the other
applets as a subcommand. In the cli method shown, we
create the clap::Command structs for each applet in a
loop, adding them to a Vec
The run method for the hashbox applet is
quite similar to main. Instead of using
args[0] this time we're using a clap subcommand and
passing that string to the get command shown way up near
the top of the post when we were setting this all up. Note that we can
only get away with if let Some(command) =
crate::commands::get(name) due to the fact that it is a trait
object being returned. We also once again use the applet name along
with any error messages being bubbled up, so we can afford to be pretty
lazy about how we do our error handling in each individual applet and
focus more on correct function. The exception would be in the cases
where we want our return values to convey how the program failed, in
which case thos things must be coded into each applet. Because POSIX,
you know?
Not shown is the bootstrap applet, which is just about the
hairiest code in the project so far. It's actually currently broken as
of this writing but will be fairly straightforward to fix, as the
breakage occurred by moving applets around and out of the original
single binary and it just needs to be made to link the applets to the
correct binaries. But I digress. Clap is really quite awesome in the
information it exposes, and bootstrap leverages a few
third party crates to provide not only the applet links but also shell
completions and Unix man pages. So when that part of the codebase is
fixed you'll be able to drop a binary into the filesystem and install
everything else with a single command.
The code for this project is at codeberg under the working title of shitbox, because naming things is hard.
Tags for this post: Programming Rust Shitbox Traits Generics