Rust criticism from a Rustacean
I've been using Rust for a little over three years now and I absolutely love it. Rust makes good on a lot of it's promises. Rust makes highly reliable software, makes concurrency and parrallel code much easier to reason about, and will make you a better programmer even on those occasions that you write some code in another language. I could continue heaping praise. But that's not the point of this post. No, this post is about taking an honest look at what I think are a few of Rust's shortcomings. Or at the very least, things that I wish had been done a little differently in the early days and which are now pretty firmly entrenched and thus difficult to fix.
1. std
calls into libc
Sometimes you want, or need, to call into some really low level functionality
provided by the OS. Rust is a systems programming language after all. You might
be surprised just how often the official or semi-blessed solution is calling into
Libc under the hood. I was when I looked at it. This has a number of implications.
It's difficult to build a system completely with Rust because std requires libc.
The Redox project had to write their own libc, in Rust, in order to solve that
chicken and egg problem. They still aren't self hosting and a lot of the reason
revolves around porting rustc itself, which depends on std
(and thus libc.
If you poke around in the source code of any give libc implementation (I recommend
either the NetBSD code base or Musl) then what you tend to see after a while is
that a huge amount of the provided functionality is a thin wrapper around the
syscall interface. When you start doing low level but not quite embedded sorts
of programming in Rust, you quickly discover that not all of those system call
wrappers are provided by std
. In fact, there isn't even a way to make syscalls
using std
. More on that later.
There have been a couple of abortive attempts to change this by creating a version
of std
that is freestanding from libc. They have mostly come to nothing because
there hasn't been enough community interest to sustain a project of that magnitude.
I find this kind of sad. I also want to contrast this with Zig, whose standard
library not only doesn't rely on libc but also provides a syscall interface to
the programmer.
2. Making syscalls in Rust
There is no official way to make syscalls in Rust, but there are crates on crates.io
which can provide this functionality. So let's look at the first one which pops
up when one does a search on crates.io, the
aptly names syscall
crate. The syscall
crate was last updated 7 years ago. The
repository no longer exists. It supports the following platforms.
- x86 Linux
- x86_64 Linux
- x86_64 freebsd
- armeabi Linux
- x86_64 MacOS
Wow, so no updates in 7 years, and that is a really short list of supported platforms!
Having explored this space previously, I happen to know that the sc
crate was
forked from the syscall
crate a pretty long time ago when the latter became
abandoned by it's author. It also has a much longer list of supported platforms.
- x86 Linux
- x86_64 Linux
- x86_64 FreeBSD
- x86_64 MacOS
- armeabi Linux
- aarch64 Linux
- aarch64 MacOS
- mips and mips64 Linux
- powerpc and powerpc64 Linux
- riscv64 Linux
- sparc64 Linux
Ok, so that list is much better. It's literally twice as long. If you're paying
attention you'll no doubt have noticed that Linux is the only OS getting much
in the way of love here, though. If you're using FreeBSD on anything other than
x86_64 you're still out in the cold, and if you care about NetBSD, OpenBSD or
Solaris you're just plain SOL. Your only recourse at that point is to either call
into inline assembly or to use the libc
crate. I don't consider that a good
state of affairs for a systems programming language. Your systems programming
language should be capable of interop with other languages but probably shouldn't
require calling into C just to provide full functionality, such as interacting
with the kernel.
3. The crates ecosystem is based on GitHub and has a flat namespace
Ok, so part of this calls back into the previous section. Remember the (abandoned)
syscall
crate? The fact that anyone can publish on crates.io and then subsequently
abandon their crate means that over time these situations are coming up more and
more. This is something that I am far from the only person criticizing, so please
can we have namespaced crates already? Pretty please, with sugar on top? This was
a laughably bad design choice right from the start.
Then there's the issue that the crates.io registry is on GitHub, and only GitHub, and in order to publish on crates.io you have to have a GitHub account. For some people this is obviously not an issue, but I'm sorry to be the dick that points out that GitHub is owned by a mega-corporation with a questionable history regarding open source and free software and there are a lot of people who experience varying levels of discomfort around this. I am sure that there are those for whom this state of affairs is a deal breaker. Personally, I'm fine with the Rust organization hosting their code on GitHub, but I think it's a bad situation that in order to get a crates.io account you must have a GitHub account. I know that the mega-corp in question hearts Linux now, big time, but we are humans with long memories after all.
4. Errors in std
are inconsistent
What am I talking about here? Consider the std::fmt::Display
trait.
pub trait Display {
// Required method
fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}
Display takes a formatter as an argument and writes into it. This is a fallible
operation, hence the Result<(), Error>
return type. Now let's look at some common
String
handling methods.
let r = "Rust";
let x: String = format!("{r} is inconsistent");
x.push_str(" in handling memory allocation errors");
x.push('.');
Let's unpack that. The format!()
macro calls the std::fmt::format
function
under the hood. Let's look at that code.
#[cfg(not(no_global_oom_handling))]
#[must_use]
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub fn format(args: Arguments<'_>) -> string::String {
fn format_inner(args: Arguments<'_>) -> string::String {
let capacity = args.estimated_capacity();
let mut output = string::String::with_capacity(capacity);
output.write_fmt(args).expect("a formatting trait implementation returned an error");
output
}
args.as_str().map_or_else(|| format_inner(args), crate::borrow::ToOwned::to_owned)
}
The observant will immediately notice the line that starts with output.write_fmt
and ends with expect
. This means that whenever you use the format!()
macro,
there is a potential panic.
The push_str
and push
methods rely on functionality provided by the underlying
Vec
, since that is what a String
is. As such, they also inherit the behavior
of completely ignoring allocation errors and instead having a silent panic hidden
in the code, which we opt into literally every time we use std
in a project.
Now, when I say that Rust is inconsistent here, look at this.
let r = "Rust".to_string();
write!(r, " is inconsistent")?;
write!(r, " in handling memory allocation errors")?;
This is another way of appending to a String
and is treated as a fallible operation
(hint - appending to a String
is almost always a fallible op). The extra observant
will also know that the to_string
method is provided by the Display
trait, for
which we implement the fmt
method, which is considered fallible. So why is the
to_string
method considered infallible? Once again, we're just opting in to
ignoring the possibility of an underlying panic!()
.
Unlike the other gripes that I've brought up earlier, this one is so entrenched by now that it's never going to be fixed for common usage. All you can do is live with it and realize that if you are that low on memory, the OS is likely going to be killing processes anyway. That said, I don't like the way Rust is inconsistent here. Rust normally enforces correctness, yet turns around and treats memory as an inexhaustible resource, ignoring or handicapping a lot of potential use cases for the language.
6. Macros were a mistake
I'm writing this hot on the heels of the 2023 Rustconf fiasco wherein a very well known and respected person in the field was set to give a talk on improving reflection in Rust, but was notified very late in the game that his talk was to be downgraded from a keynote to a regular talk. This caused much weeping, wailing and gnashing of teeth and did a lot of harm to the community and how we are all viewed by the wider FLOSS community. It also sucks, because that would have likely been a very interesting talk on a subject that I care a lot about in relation to Rust.
Apart from Rust, the other language that has really caught my attention is Zig.
One of Zig's outstanding core features is comptime
, which is basically code
which is evaluated and executed at compile time rather than at runtime. The way
that Andrew Kelley has implemented this has not only provided a full replacement
for any kind of macro system but has provided other benefits as well. According
to Kelley, when he fully embraced comptime
then generics "just fell into my lap".
He's not kidding, either. The concept of generics is only possible with some sort
of compile time reflection in any language, and when you look at Zig code what
is amazing is that there is no special syntax required for doing very interesting
things at compile time, including generics and pretty much all of the things
that we reach for macros to do in other languages, Rust included.
I guarantee that a lot of people disagree with this. I don't care. Macros are a mistake in light of the fact that there is a way to do the same thing without having to have a second DSL to learn on top of the language itself. Zig has proven it without any doubt to myself and to anyone else paying attention. You can, at least in this particular case, have your cake and eat it, too.
Consider this code from the json
module in Zig's std
. This function takes a
value
of any type, including your own custom complex types, and uses type
reflection to output a json string to a stream of any type that meets the function's
requirements.
pub fn stringify(
value: anytype,
options: StringifyOptions,
out_stream: anytype,
) !void {
const T = @TypeOf(value);
switch (@typeInfo(T)) {
.Float, .ComptimeFloat => {
return std.fmt.formatFloatScientific(value, std.fmt.FormatOptions{}, out_stream);
},
.Int, .ComptimeInt => {
return std.fmt.formatIntValue(value, "", std.fmt.FormatOptions{}, out_stream);
},
.Bool => {
return out_stream.writeAll(if (value) "true" else "false");
},
.Null => {
return out_stream.writeAll("null");
},
.Optional => {
if (value) |payload| {
return try stringify(payload, options, out_stream);
} else {
return try stringify(null, options, out_stream);
}
},
.Enum => {
if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
return value.jsonStringify(options, out_stream);
}
@compileError("Unable to stringify enum '" ++ @typeName(T) ++ "'");
},
.Union => {
if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
return value.jsonStringify(options, out_stream);
}
const info = @typeInfo(T).Union;
if (info.tag_type) |UnionTagType| {
try out_stream.writeByte('{');
var child_options = options;
child_options.whitespace.indent_level += 1;
inline for (info.fields) |u_field| {
if (value == @field(UnionTagType, u_field.name)) {
try child_options.whitespace.outputIndent(out_stream);
try encodeJsonString(u_field.name, options, out_stream);
try out_stream.writeByte(':');
if (child_options.whitespace.separator) {
try out_stream.writeByte(' ');
}
if (u_field.type == void) {
try out_stream.writeAll("{}");
} else {
try stringify(@field(value, u_field.name), child_options, out_stream);
}
break;
}
} else {
unreachable; // No active tag?
}
try options.whitespace.outputIndent(out_stream);
try out_stream.writeByte('}');
return;
} else {
@compileError("Unable to stringify untagged union '" ++ @typeName(T) ++ "'");
}
},
.Struct => |S| {
if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
return value.jsonStringify(options, out_stream);
}
try out_stream.writeByte(if (S.is_tuple) '[' else '{');
var field_output = false;
var child_options = options;
child_options.whitespace.indent_level += 1;
inline for (S.fields) |Field| {
// don't include void fields
if (Field.type == void) continue;
var emit_field = true;
// don't include optional fields that are null when emit_null_optional_fields is set to false
if (@typeInfo(Field.type) == .Optional) {
if (options.emit_null_optional_fields == false) {
if (@field(value, Field.name) == null) {
emit_field = false;
}
}
}
if (emit_field) {
if (!field_output) {
field_output = true;
} else {
try out_stream.writeByte(',');
}
try child_options.whitespace.outputIndent(out_stream);
if (!S.is_tuple) {
try encodeJsonString(Field.name, options, out_stream);
try out_stream.writeByte(':');
if (child_options.whitespace.separator) {
try out_stream.writeByte(' ');
}
}
try stringify(@field(value, Field.name), child_options, out_stream);
}
}
if (field_output) {
try options.whitespace.outputIndent(out_stream);
}
try out_stream.writeByte(if (S.is_tuple) ']' else '}');
return;
},
.ErrorSet => return stringify(@as([]const u8, @errorName(value)), options, out_stream),
.Pointer => |ptr_info| switch (ptr_info.size) {
.One => switch (@typeInfo(ptr_info.child)) {
.Array => {
const Slice = []const std.meta.Elem(ptr_info.child);
return stringify(@as(Slice, value), options, out_stream);
},
else => {
// TODO: avoid loops?
return stringify(value.*, options, out_stream);
},
},
.Many, .Slice => {
if (ptr_info.size == .Many and ptr_info.sentinel == null)
@compileError("unable to stringify type '" ++ @typeName(T) ++ "' without sentinel");
const slice = if (ptr_info.size == .Many) mem.span(value) else value;
if (ptr_info.child == u8 and options.string == .String and std.unicode.utf8ValidateSlice(slice)) {
try encodeJsonString(slice, options, out_stream);
return;
}
try out_stream.writeByte('[');
var child_options = options;
child_options.whitespace.indent_level += 1;
for (slice, 0..) |x, i| {
if (i != 0) {
try out_stream.writeByte(',');
}
try child_options.whitespace.outputIndent(out_stream);
try stringify(x, child_options, out_stream);
}
if (slice.len != 0) {
try options.whitespace.outputIndent(out_stream);
}
try out_stream.writeByte(']');
return;
},
else => @compileError("Unable to stringify type '" ++ @typeName(T) ++ "'"),
},
.Array => return stringify(&value, options, out_stream),
.Vector => |info| {
const array: [info.len]info.child = value;
return stringify(&array, options, out_stream);
},
else => @compileError("Unable to stringify type '" ++ @typeName(T) ++ "'"),
}
unreachable;
}
So what exactly is going on here anyway? Well @TypeOf(value)
is a compiler
builtin function which gets the type of value
(in Zig the '@' sigil signifies
a compiler builtin). Then we get the enum value of T
with another builtin,
@TypeInfo
, which we can switch on (match is the closest Rust equivalent). What
I love about this is that all of this is written in plain Zig, not a DSL or macro
language. There is no special syntax required and the code is just as readable as
any other zig code which is not using any type of reflection.
To contrast this, the serde
crate has to call into syn
to build an ast in
order to get type information for serialization. Should you decide to take a peak
into the source code of syn
you will find a great big ball of
macros which tends to
make your eyes glaze over and gives me a feeling of wanting to curl up into a ball
and hug my knees while rocking back and forth to Pink Floyd's "Comfortably Numb".
You shouldn't have to generate an ast in order to accomplish this task, because
the compiler has to generate an ast as part of compilation. That's duplicate
effort. Further, even those who regularly write proc macros can have a hard time
reasoning about what is going on. The fact that macros are so powerful combined
with the fact that they are so difficult to understand leads to, IMO, an unsafe
situation. This should not be a black art or an arcane science. A language feature
as powerful as proc macros should be simple to read and understand without any
special knowledge beyond understanding the language itself. The Zig way is indeed
the better way.
My Ideal Language
Rust comes very close to being my ideal language, but I'm actually looking for
"the next language" to come after it. I mean that in a very positive way. I think
that when we look back at the early part of this century there will be a dividing
line in software development that sits right where Rust emerged as a real challenger
to the existing order. I believe Rust has kicked off a renaissance of sorts, but
that perhaps Rust itself is just showing the way, and that an even better solution
could be on the horizon. All it's going to take is for a strongly motivated
person or, preferably, group of people to do the work of building such a language,
without repeating the same mistakes. Granted, some of the gripes that I mentioned
are deficiencies in std
and the crates ecosystem, and as such aren't technically
reason to write a whole new language. But at this point I just don't see anyone
realistically switching to a std-2.0
.
So anyway, let's indulge for a moment my fantasy "post Rust era ideal language".
It would actually be a lot like Rust, but with a little sprinkling of Zig's
comptime
goodness. There would be a borrow checker, and it would give great
error messages just like Rustc does. We'd have the concept of comptime
so no
macros, and it would be obvious that's how generics functioned under the hood.
But there would also be traits, which Zig doesn't have, because that is such a
useful concept. I'm not picky about syntax other than the fact that it has to
behave as consistently as possible. The borrow checker would obviously lead to a
great concurrency and parallelism story just like Rust. Unlike Rust, the std
of this new language would be built from the ground up to be able to do anything
and everything that we think of as C's domain but without ever resorting to
calling into C or assembly. We would get built in functionality for interfacing
with other languages as well as directly talking to the underlying OS kernel.
And finally, I envision a library package ecosystem which not only has namespaces,
but would actually be fully distributed and decentralized - federated if you will.
How that would work I will leave up to the imagination, other than to say
emphatically that there would be no blockchain shenanigans and crytobros can just
get off my blog already, but the FediVerse and Matrix have shown that distributed
communication can scale, and now we just need to iterate and improve.
In the meantime, you can and should use Rust. In spite of any shortcomings it's still the best language that has come so far in terms of enforcing memory access safety and overall correctness. A lot of the little gripes I've mentioned are indeed pretty uncommon things for the average programmer to have to deal with. If you feel like trying something else (and haven't already done so) then you should also try out Zig, because it's a wonderful language with some fantastic ideas of it's own and is actually ahead of Rust when it comes to it's suitability to fully replace C.
2023-06-06