Rust criticism from a Rustacean

I've been using Rust for a little over three years now and I absolutely love it. Rust makes good on a lot of it's promises. Rust makes highly reliable software, makes concurrency and parrallel code much easier to reason about, and will make you a better programmer even on those occasions that you write some code in another language. I could continue heaping praise. But that's not the point of this post. No, this post is about taking an honest look at what I think are a few of Rust's shortcomings. Or at the very least, things that I wish had been done a little differently in the early days and which are now pretty firmly entrenched and thus difficult to fix.

1. `std` calls into libc

Sometimes you want, or need, to call into some really low level functionality provided by the OS. Rust is a systems programming language after all. You might be surprised just how often the official or semi-blessed solution is calling into Libc under the hood. I was when I looked at it. This has a number of implications. It's difficult to build a system completely with Rust because std requires libc. The Redox project had to write their own libc, in Rust, in order to solve that chicken and egg problem. They still aren't self hosting and a lot of the reason revolves around porting rustc itself, which depends on std (and thus libc.

If you poke around in the source code of any give libc implementation (I recommend either the NetBSD code base or Musl) then what you tend to see after a while is that a huge amount of the provided functionality is a thin wrapper around the syscall interface. When you start doing low level but not quite embedded sorts of programming in Rust, you quickly discover that not all of those system call wrappers are provided by std. In fact, there isn't even a way to make syscalls using std. More on that later.

There have been a couple of abortive attempts to change this by creating a version of std that is freestanding from libc. They have mostly come to nothing because there hasn't been enough community interest to sustain a project of that magnitude. I find this kind of sad. I also want to contrast this with Zig, whose standard library not only doesn't rely on libc but also provides a syscall interface to the programmer.

2. Making syscalls in Rust

There is no official way to make syscalls in Rust, but there are crates on crates.io which can provide this functionality. So let's look at the first one which pops up when one does a search on crates.io, the aptly names syscall crate. The syscall crate was last updated 7 years ago. The repository no longer exists. It supports the following platforms.

x86 Linux
x86_64 Linux
x86_64 freebsd
armeabi Linux
x86_64 MacOS

Wow, so no updates in 7 years, and that is a really short list of supported platforms!

Having explored this space previously, I happen to know that the sc crate was forked from the syscall crate a pretty long time ago when the latter became abandoned by it's author. It also has a much longer list of supported platforms.

x86 Linux
x86_64 Linux
x86_64 FreeBSD
x86_64 MacOS
armeabi Linux
aarch64 Linux
aarch64 MacOS
mips and mips64 Linux
powerpc and powerpc64 Linux
riscv64 Linux
sparc64 Linux

Ok, so that list is much better. It's literally twice as long. If you're paying attention you'll no doubt have noticed that Linux is the only OS getting much in the way of love here, though. If you're using FreeBSD on anything other than x86_64 you're still out in the cold, and if you care about NetBSD, OpenBSD or Solaris you're just plain SOL. Your only recourse at that point is to either call into inline assembly or to use the libc crate. I don't consider that a good state of affairs for a systems programming language. Your systems programming language should be capable of interop with other languages but probably shouldn't require calling into C just to provide full functionality, such as interacting with the kernel.

3. The crates ecosystem is based on GitHub and has a flat namespace

Ok, so part of this calls back into the previous section. Remember the (abandoned) syscall crate? The fact that anyone can publish on crates.io and then subsequently abandon their crate means that over time these situations are coming up more and more. This is something that I am far from the only person criticizing, so please can we have namespaced crates already? Pretty please, with sugar on top? This was a laughably bad design choice right from the start.

Then there's the issue that the crates.io registry is on GitHub, and only GitHub, and in order to publish on crates.io you have to have a GitHub account. For some people this is obviously not an issue, but I'm sorry to be the dick that points out that GitHub is owned by a mega-corporation with a questionable history regarding open source and free software and there are a lot of people who experience varying levels of discomfort around this. I am sure that there are those for whom this state of affairs is a deal breaker. Personally, I'm fine with the Rust organization hosting their code on GitHub, but I think it's a bad situation that in order to get a crates.io account you must have a GitHub account. I know that the mega-corp in question hearts Linux now, big time, but we are humans with long memories after all.

4. Errors in `std` are inconsistent

What am I talking about here? Consider the std::fmt::Display trait.

pub trait Display {
    // Required method
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}

Display takes a formatter as an argument and writes into it. This is a fallible operation, hence the Result<(), Error> return type. Now let's look at some common String handling methods.

let r = "Rust";
let x: String = format!("{r} is inconsistent");
x.push_str(" in handling memory allocation errors");
x.push('.');

Let's unpack that. The format!() macro calls the std::fmt::format function under the hood. Let's look at that code.

#[cfg(not(no_global_oom_handling))]
#[must_use]
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub fn format(args: Arguments<'_>) -> string::String {
    fn format_inner(args: Arguments<'_>) -> string::String {
        let capacity = args.estimated_capacity();
        let mut output = string::String::with_capacity(capacity);
        output.write_fmt(args).expect("a formatting trait implementation returned an error");
        output
    }

    args.as_str().map_or_else(|| format_inner(args), crate::borrow::ToOwned::to_owned)
}

The observant will immediately notice the line that starts with output.write_fmt and ends with expect. This means that whenever you use the format!() macro, there is a potential panic.

The push_str and push methods rely on functionality provided by the underlying Vec, since that is what a String is. As such, they also inherit the behavior of completely ignoring allocation errors and instead having a silent panic hidden in the code, which we opt into literally every time we use std in a project.

Now, when I say that Rust is inconsistent here, look at this.

let r = "Rust".to_string();
write!(r, " is inconsistent")?;
write!(r, " in handling memory allocation errors")?;

This is another way of appending to a String and is treated as a fallible operation (hint - appending to a String is almost always a fallible op). The extra observant will also know that the to_string method is provided by the Display trait, for which we implement the fmt method, which is considered fallible. So why is the to_string method considered infallible? Once again, we're just opting in to ignoring the possibility of an underlying panic!().

Unlike the other gripes that I've brought up earlier, this one is so entrenched by now that it's never going to be fixed for common usage. All you can do is live with it and realize that if you are that low on memory, the OS is likely going to be killing processes anyway. That said, I don't like the way Rust is inconsistent here. Rust normally enforces correctness, yet turns around and treats memory as an inexhaustible resource, ignoring or handicapping a lot of potential use cases for the language.

6. Macros were a mistake

I'm writing this hot on the heels of the 2023 Rustconf fiasco wherein a very well known and respected person in the field was set to give a talk on improving reflection in Rust, but was notified very late in the game that his talk was to be downgraded from a keynote to a regular talk. This caused much weeping, wailing and gnashing of teeth and did a lot of harm to the community and how we are all viewed by the wider FLOSS community. It also sucks, because that would have likely been a very interesting talk on a subject that I care a lot about in relation to Rust.

Apart from Rust, the other language that has really caught my attention is Zig. One of Zig's outstanding core features is comptime, which is basically code which is evaluated and executed at compile time rather than at runtime. The way that Andrew Kelley has implemented this has not only provided a full replacement for any kind of macro system but has provided other benefits as well. According to Kelley, when he fully embraced comptime then generics "just fell into my lap". He's not kidding, either. The concept of generics is only possible with some sort of compile time reflection in any language, and when you look at Zig code what is amazing is that there is no special syntax required for doing very interesting things at compile time, including generics and pretty much all of the things that we reach for macros to do in other languages, Rust included.

I guarantee that a lot of people disagree with this. I don't care. Macros are a mistake in light of the fact that there is a way to do the same thing without having to have a second DSL to learn on top of the language itself. Zig has proven it without any doubt to myself and to anyone else paying attention. You can, at least in this particular case, have your cake and eat it, too.

Consider this code from the json module in Zig's std. This function takes a value of any type, including your own custom complex types, and uses type reflection to output a json string to a stream of any type that meets the function's requirements.

pub fn stringify(
    value: anytype,
    options: StringifyOptions,
    out_stream: anytype,
) !void {
    const T = @TypeOf(value);
    switch (@typeInfo(T)) {
        .Float, .ComptimeFloat => {
            return std.fmt.formatFloatScientific(value, std.fmt.FormatOptions{}, out_stream);
        },
        .Int, .ComptimeInt => {
            return std.fmt.formatIntValue(value, "", std.fmt.FormatOptions{}, out_stream);
        },
        .Bool => {
            return out_stream.writeAll(if (value) "true" else "false");
        },
        .Null => {
            return out_stream.writeAll("null");
        },
        .Optional => {
            if (value) |payload| {
                return try stringify(payload, options, out_stream);
            } else {
                return try stringify(null, options, out_stream);
            }
        },
        .Enum => {
            if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
                return value.jsonStringify(options, out_stream);
            }

            @compileError("Unable to stringify enum '" ++ @typeName(T) ++ "'");
        },
        .Union => {
            if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
                return value.jsonStringify(options, out_stream);
            }

            const info = @typeInfo(T).Union;
            if (info.tag_type) |UnionTagType| {
                try out_stream.writeByte('{');
                var child_options = options;
                child_options.whitespace.indent_level += 1;
                inline for (info.fields) |u_field| {
                    if (value == @field(UnionTagType, u_field.name)) {
                        try child_options.whitespace.outputIndent(out_stream);
                        try encodeJsonString(u_field.name, options, out_stream);
                        try out_stream.writeByte(':');
                        if (child_options.whitespace.separator) {
                            try out_stream.writeByte(' ');
                        }
                        if (u_field.type == void) {
                            try out_stream.writeAll("{}");
                        } else {
                            try stringify(@field(value, u_field.name), child_options, out_stream);
                        }
                        break;
                    }
                } else {
                    unreachable; // No active tag?
                }
                try options.whitespace.outputIndent(out_stream);
                try out_stream.writeByte('}');
                return;
            } else {
                @compileError("Unable to stringify untagged union '" ++ @typeName(T) ++ "'");
            }
        },
        .Struct => |S| {
            if (comptime std.meta.trait.hasFn("jsonStringify")(T)) {
                return value.jsonStringify(options, out_stream);
            }

            try out_stream.writeByte(if (S.is_tuple) '[' else '{');
            var field_output = false;
            var child_options = options;
            child_options.whitespace.indent_level += 1;
            inline for (S.fields) |Field| {
                // don't include void fields
                if (Field.type == void) continue;

                var emit_field = true;

                // don't include optional fields that are null when emit_null_optional_fields is set to false
                if (@typeInfo(Field.type) == .Optional) {
                    if (options.emit_null_optional_fields == false) {
                        if (@field(value, Field.name) == null) {
                            emit_field = false;
                        }
                    }
                }

                if (emit_field) {
                    if (!field_output) {
                        field_output = true;
                    } else {
                        try out_stream.writeByte(',');
                    }
                    try child_options.whitespace.outputIndent(out_stream);
                    if (!S.is_tuple) {
                        try encodeJsonString(Field.name, options, out_stream);
                        try out_stream.writeByte(':');
                        if (child_options.whitespace.separator) {
                            try out_stream.writeByte(' ');
                        }
                    }
                    try stringify(@field(value, Field.name), child_options, out_stream);
                }
            }
            if (field_output) {
                try options.whitespace.outputIndent(out_stream);
            }
            try out_stream.writeByte(if (S.is_tuple) ']' else '}');
            return;
        },
        .ErrorSet => return stringify(@as([]const u8, @errorName(value)), options, out_stream),
        .Pointer => |ptr_info| switch (ptr_info.size) {
            .One => switch (@typeInfo(ptr_info.child)) {
                .Array => {
                    const Slice = []const std.meta.Elem(ptr_info.child);
                    return stringify(@as(Slice, value), options, out_stream);
                },
                else => {
                    // TODO: avoid loops?
                    return stringify(value.*, options, out_stream);
                },
            },
            .Many, .Slice => {
                if (ptr_info.size == .Many and ptr_info.sentinel == null)
                    @compileError("unable to stringify type '" ++ @typeName(T) ++ "' without sentinel");
                const slice = if (ptr_info.size == .Many) mem.span(value) else value;

                if (ptr_info.child == u8 and options.string == .String and std.unicode.utf8ValidateSlice(slice)) {
                    try encodeJsonString(slice, options, out_stream);
                    return;
                }

                try out_stream.writeByte('[');
                var child_options = options;
                child_options.whitespace.indent_level += 1;
                for (slice, 0..) |x, i| {
                    if (i != 0) {
                        try out_stream.writeByte(',');
                    }
                    try child_options.whitespace.outputIndent(out_stream);
                    try stringify(x, child_options, out_stream);
                }
                if (slice.len != 0) {
                    try options.whitespace.outputIndent(out_stream);
                }
                try out_stream.writeByte(']');
                return;
            },
            else => @compileError("Unable to stringify type '" ++ @typeName(T) ++ "'"),
        },
        .Array => return stringify(&value, options, out_stream),
        .Vector => |info| {
            const array: [info.len]info.child = value;
            return stringify(&array, options, out_stream);
        },
        else => @compileError("Unable to stringify type '" ++ @typeName(T) ++ "'"),
    }
    unreachable;
}

So what exactly is going on here anyway? Well @TypeOf(value) is a compiler builtin function which gets the type of value (in Zig the '@' sigil signifies a compiler builtin). Then we get the enum value of T with another builtin, @TypeInfo, which we can switch on (match is the closest Rust equivalent). What I love about this is that all of this is written in plain Zig, not a DSL or macro language. There is no special syntax required and the code is just as readable as any other zig code which is not using any type of reflection.

To contrast this, the serde crate has to call into syn to build an ast in order to get type information for serialization. Should you decide to take a peak into the source code of syn you will find a great big ball of macros which tends to make your eyes glaze over and gives me a feeling of wanting to curl up into a ball and hug my knees while rocking back and forth to Pink Floyd's "Comfortably Numb". You shouldn't have to generate an ast in order to accomplish this task, because the compiler has to generate an ast as part of compilation. That's duplicate effort. Further, even those who regularly write proc macros can have a hard time reasoning about what is going on. The fact that macros are so powerful combined with the fact that they are so difficult to understand leads to, IMO, an unsafe situation. This should not be a black art or an arcane science. A language feature as powerful as proc macros should be simple to read and understand without any special knowledge beyond understanding the language itself. The Zig way is indeed the better way.

My Ideal Language

Rust comes very close to being my ideal language, but I'm actually looking for "the next language" to come after it. I mean that in a very positive way. I think that when we look back at the early part of this century there will be a dividing line in software development that sits right where Rust emerged as a real challenger to the existing order. I believe Rust has kicked off a renaissance of sorts, but that perhaps Rust itself is just showing the way, and that an even better solution could be on the horizon. All it's going to take is for a strongly motivated person or, preferably, group of people to do the work of building such a language, without repeating the same mistakes. Granted, some of the gripes that I mentioned are deficiencies in std and the crates ecosystem, and as such aren't technically reason to write a whole new language. But at this point I just don't see anyone realistically switching to a std-2.0.

So anyway, let's indulge for a moment my fantasy "post Rust era ideal language". It would actually be a lot like Rust, but with a little sprinkling of Zig's comptime goodness. There would be a borrow checker, and it would give great error messages just like Rustc does. We'd have the concept of comptime so no macros, and it would be obvious that's how generics functioned under the hood. But there would also be traits, which Zig doesn't have, because that is such a useful concept. I'm not picky about syntax other than the fact that it has to behave as consistently as possible. The borrow checker would obviously lead to a great concurrency and parallelism story just like Rust. Unlike Rust, the std of this new language would be built from the ground up to be able to do anything and everything that we think of as C's domain but without ever resorting to calling into C or assembly. We would get built in functionality for interfacing with other languages as well as directly talking to the underlying OS kernel. And finally, I envision a library package ecosystem which not only has namespaces, but would actually be fully distributed and decentralized - federated if you will. How that would work I will leave up to the imagination, other than to say emphatically that there would be no blockchain shenanigans and crytobros can just get off my blog already, but the FediVerse and Matrix have shown that distributed communication can scale, and now we just need to iterate and improve.

In the meantime, you can and should use Rust. In spite of any shortcomings it's still the best language that has come so far in terms of enforcing memory access safety and overall correctness. A lot of the little gripes I've mentioned are indeed pretty uncommon things for the average programmer to have to deal with. If you feel like trying something else (and haven't already done so) then you should also try out Zig, because it's a wonderful language with some fantastic ideas of it's own and is actually ahead of Rust when it comes to it's suitability to fully replace C.

2023-06-06

Jean G3nie