Time

No, not that Pink Floyd song off Dark Side (although that's a great song). I'm planning to smack down some Unix and computer time knowledge in today's post. Let's keep it on the practical side of things and we'll try not to stray too far into the metaphysical.

It's more complex than you might think

It's pretty common as a programmer to have to deal with time, and if you have then you know that it's a much more complex concept than it at first appears. If you haven't strayed into an area where you're dealing with the representation of time, then you've probably never given it much thought.

When you look closely at our current system of timekeeping and examine it as a system, without thinking in terms of the natural world, you start to notice that nothing lines up into nice little buckets for the poor programmer. Sure, it all makes perfect sense when displayed against the framework of the natural world, but take that away and it just looks batshit crazy. Let's break things down into units of time and how they're measured.

  • seconds are divided into decimal bits
  • minutes are 60 seconds
  • hourse are 60 minutes
  • days are 24 hours (most of the time, ignoring leap seconds for now)
  • weeks are 7 days
  • months are anywhere from 28-31 days, depending on which month of the year it is and whether it's a leap year or not
  • Years have 12 months and either 365 or 366 days
  • A leap year is a year that is divisible by 4, but not by 100, unless it is also divisible by 400
  • Every 15 degrees around the globe we have a timmezone, which is generally one hour off from it's neighbor in local time, except for those places which use dailight savings time during the summer, and the middle of the pacific where you might travel accross a magical boundary line one day into the past.

There aren't a lot of decimals or binaries in all of that, and there are some interesting rules to follow. It kind of leaves one in awe of the human brain to think that our mushy grey matter can intuitively grasp such an inconsistent standard, which to a computer must look like someone has been spiking the punch at a junior high prom.

One might, at first glance, think that dealing with timezones (as an example) is not really all that hard. Well, what if it's March 1st at 2AM and you're living in UTC+5? What does that translate to in UTC time? Depends on what year it is.

Enter the Unix epoch

The Unix epoch is nothing more than an agreed upon moment, coinciding with the date and time of January 1st, 1970 at 00::00:00 AM UTC. In Unix we use timestamps to track file creation and modification times, which are a large signed integer which counts seconds from the Unix epoch. This greatly simplifies timekeeping on Unix systems, as we just have to count seconds. Need to order a bunch of blog posts in an aggregator by time of publication? Easiest way to do it is to convert to an i64 referenced to the epoch. So rather than a complex mathematical system to compare date-times all we really need is a way to convert from a human readable time representation to an i64 referenced to the epoch, and back again.

What about Leap seconds?

That's a good question. For the unaware, our current leap year system isn't wholly adequate to keep our clocks in sync with the cosmos and planetary movement, so we occasionally add in a leap second to fix the discrepancy. But on Unix systems we fudge it a bit. Rather than having a day with 60 * 60 * 24 seconds plus 1 leap second we generally just count the last second in the day twice. This does mean, however, that on those days there is a very ambiguous window between 11:59:59 and 00:00:00, where the timestamps cannot be considered an accurate representation for ordering purposes.

How does this conversion work, anyway?

I'm glad that you asked. I've implemented datetime containers in Rust a few times now, but I'm a little tired of Rust and want to work in Zig for a bit. While Zig doesn't have all of the gaurantees regarding memory safety that Rust has, both languages benefit from having very strong typing including algabreic data types, which goes a surprisingly long way towards turning runtime errors into compile time errors. In fact, if you make a habit of touching the heap as little as possible I think Zig comes damn near Rust in terms of enforcing safety and correctness. So let's indulge for a moment and take a peak at a datetime container in Zig, with methods to convert to and from i64 timestamps. We're going to start with this year nonsense and work down from there to months and days.

const std = @import("std");
const debug = std.debug;
const testing = std.testing;
const SECONDS_PER_DAY = @import("main.zig").SECONDS_PER_DAY;

pub const YearTag = enum(u1) {
    normal,
    leap,

    fn new(year: i32) YearTag {
        return if (@rem(year, 4) == 0 and (@rem(year, 100) != 0 or @rem(year, 400) == 0)) .leap else .normal;
    }
};

This bit of code is just an enum which tells us whether we're in a leap year or not. The new function takes in i32, the year, and returns either .normal of .leap depending on the math. We're going to use this enum as a tag to create our Year type, which is a tagged union, so that whether or not a given year is a leap year or not is encoded in the type. As a quick aside before looking at that, I want to point out that YearTag is an enum(u1), meaning that it is represented by a one bit integer. Combined with other language constructs such as packed structs, Zig allows you to pack the maximum amount of data into the minimum memory footprint. Aside over, let's look at our Year union.


pub const Year = union(YearTag) {
    normal: i32,
    leap: i32,

    const Self = @This();

    pub fn new(year: i32) Self {
        return switch (YearTag.new(year)) {
            .normal => Self{ .normal = year },
            .leap => Self{ .leap = year },
        };
    }
}

Since we want to break some of this math down into bite sized chunks, we're going to use our Year tag to give us a function which returns the number of days in any given year. Since we want to be able to convert to seconds, it's also useful to provide a function which gives to total number of seconds in a given year.

    pub fn days(self: Self) u16 {
        return switch (self) {
            .normal => 365,
            .leap => 366,
        };
    }

    pub fn seconds(self: Self) i64 {
        return @as(i64, self.days()) * SECONDS_PER_DAY;
    }

We're going to round out our little year module with a function which get's the number portion of our data type, functions which give us the next or previous year, and a function which pretty-prints the year for us which we can use in Zig's format strings.

    pub fn get(self: Self) i32 {
        return switch (self) {
            .normal => |year| year,
            .leap => |year| year,
        };
    }

    pub fn next(self: Self) Self {
        return Self.new(self.get() + 1);
    }

    pub fn previous(self: Self) Self {
        return Self.new(self.get() - 1);
    }

    pub fn format(
        self: Self,
        comptime fmt: []const u8,
        options: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        _ = fmt;
        _ = options;

        const year = self.get();
        if (year > 0) {
            try writer.print("{d:0>4}", .{@intCast(u32, year)});
        } else {
            try writer.print("{d:0>4}", .{year});
        }
    }
};

I want to point out that you could very well use an i32 by itself to represent the year in your datetime container. Doing it this way, by leveraging a strong type system, makes it harder to represent gibberish data. It also takes care of some of the math that we'll be dealing with later, which is going to shorten our conversion functions considerably. Let's move on then, to months.

Months

We have twelve months on our calendar, and we want to represent them in a way that is human readable while also making sense to the processor. We also want to move some of our logic into our month module so that we won't have to deal with it later, just like we did with our year module. The type I'm going to reach for this time is just an enum, not a tagged union, because we don't have an extra numerical component to represent. So, like our Year union, I'm providing methods to return the number of days, number of seconds, next and previous months.

pub const Month = enum(u4) {
    january = 1,
    february = 2,
    march = 3,
    april = 4,
    may = 5,
    june = 6,
    july = 7,
    august = 8,
    september = 9,
    october = 10,
    november = 11,
    december = 12,

    const Self = @This();

    pub fn days(self: Self, year: Year) u5 {
        return switch (@enumToInt(self)) {
            1, 3, 5, 7, 8, 10, 12 => 31,
            2 => switch (year) {
                .normal => 28,
                .leap => 29,
            },
            else => 30,
        };
    }

    pub fn seconds(self: Self, year: Year) u32 {
        return @as(u32, self.days(year)) * SECONDS_PER_DAY;
    }

    pub fn next(self: Self) ?Self {
        const num = @enumToInt(self);
        return if (num < 12) @intToEnum(Self, num + 1) else null;
    }

    pub fn previous(self: Self) ?Self {
        const num = @enumToInt(self);
        return if (num > 1) @intToEnum(Self, num - 1) else null;
    }
};

Now, we're almost ready for our actual DateTime container, but it's probably a good idea to give some thought to how we're going to handle time zones first.

TimeZones in Zig

Let's lay some groundwork. Our TimeZone is going to be expressed as either UTC, or as a positive or a negative offset from UTC. Whenever something can be one of several different things that says "I'm an enum!", and since we have tagged unions in Zig we can then encode the data and it's meaning (the tags) right into the types. Notice that the HoursMinutes struct is not marked pub? We don't really want or need it cluttering up the public API.

pub const TimeZoneTag = enum(u1) {
    utc,
    offset,
};

pub const Sign = enum(u1) {
    positive,
    negative,
};

const HoursMinutes = struct {
    hours: u4,
    minutes: ?u6,
};

pub const Offset = union(Sign) {
    positive: HoursMinutes,
    negative: HoursMinutes,
};

That's pretty nice already, but we can add some logic into Offset which will again shorten our conversion functions later.

pub const Offset = union(Sign) {
    positive: HoursMinutes,
    negative: HoursMinutes,

    const Self = @This();

    pub fn new(hours: i8, minutes: ?u6) ?Self {
        if (hours > 12 or hours < -12) {
            return null;
        }
        if (minutes) |m| {
            if (m > 59) return null;
            if (hours == 0 and m == 0) return null;
        } else if (hours == 0) return null;
        if (hours < 0) {
            const h = @intCast(u4, @as(i8, hours) * -1);
            return Self{ .negative = .{ .hours = h, .minutes = minutes } };
        } else {
            return Self{ .positive = .{ .hours = @intCast(u4, hours), .minutes = minutes } };
        }
    }

    pub fn asSeconds(self: Self) i64 {
        return switch (self) {
            .positive => |ofs| blk: {
                var seconds = @as(i64, ofs.hours) * 3600;
                if (ofs.minutes) |m| seconds += (@as(i64, m) * 60);
                break :blk seconds;
            },
            .negative => |ofs| blk: {
                var seconds = @as(i64, ofs.hours) * 3600;
                if (ofs.minutes) |m| seconds += (@as(i64, m) * 60);
                break :blk seconds * -1;
            },
        };
    }
};

So now with the groundwork layed, we can go from those smaller types to the actual TimeZone container, a tagged union which is either .utc or .offset, where the .offset variant contains it's data.


pub const TimeZone = union(TimeZoneTag) {
    utc: void,
    offset: Offset,

    const Self = @This();

    pub fn new(hours: ?i8, minutes: ?u6) ?Self {
        return if (hours) |h| blk: {
            if (h == 0) {
                break :blk .utc;
            } else if (Offset.new(h, minutes)) |ofs| {
                break :blk Self{ .offset = ofs };
            } else {
                break :blk null;
            }
        } else if (minutes) |m| Self{ .offset = Offset.new(0, m).? } else .utc;
    }

    pub fn format(
        self: Self,
        comptime fmt: []const u8,
        options: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        _ = fmt;
        _ = options;

        switch (self) {
            .utc => try writer.writeAll("Z"),
            .offset => |ofs| switch (ofs) {
                .positive => |p| {
                    try writer.print("+{d:0>2}", .{p.hours});
                    if (p.minutes) |m| {
                        try writer.print(":{d:0>2}", .{m});
                    }
                },
                .negative => |n| {
                    try writer.print("-{d:0>2}", .{n.hours});
                    if (n.minutes) |m| {
                        try writer.print(":{d:0>2}", .{m});
                    }
                },
            },
        }
    }
};

Now with all of these smaller container types, which encode a lot of our program logic including bounds (aren't strongly typed languages great?) it's time to go for it and put it all together with days, hours, minutes and seconds and create our DateTime container.

DateTime in Zig

pub const DateTime = struct {
    year: Year,
    month: Month,
    day: u8,
    hour: ?u5,
    minute: ?u6,
    second: ?u6,
    tz: TimeZone,
};

So yeah, we're actually using all of those types here. The previous work wasn't entirely in vain, and we now have a DateTime representation. Note that we're allowing for some variable precision here by making hours, minutes and seconds optionals. But wait, didn't I say that there were going to be conversions to an from i64? Oh yes, I did all right. Let's tackle converting to a timestamp first. We're going to keep a running total of seconds and work through years, months, days, hours etc. The comments in the code should guide you through if it's not already clear.

    fn toTimestampNaive(self: Self) i64 {
        var seconds: i64 = 0;

        // If the year is before 1970, we have to work
        // backwards. For simplicity, we'll count *past* the date
        // to the beginning of the year, then work forwards again.
        if (self.year.get() < 1970) {
            var year = Year.new(1970);
            while (year.get() != self.year.get()) {
                // get the previous year and subtract the total seconds in it
                year = year.previous();
                seconds -= year.seconds();
            }
        } else if (self.year.get() > 1970) {
            var year = Year.new(1970);

            // get the next year and add all of it's seconds
            while (year.get() != self.year.get()) {
                seconds += year.seconds();
                year = year.next();
            }
        }

        // Until we get to the current month, we'll advance through each
        // month starting with January and add all of it's seconds
        var month = Month.january;
        while (month != self.month) {
            seconds += month.seconds(self.year);
            month = month.next().?;
        }

        // The days begin numbering with 1, so on the 5th we have had four full
        // days plus some remainder. So we take self.days - 1 for our calculation
        seconds += @as(i64, self.day - 1) * SECONDS_PER_DAY;
        if (self.hour) |h| {
            seconds += @as(i64, h) * 3600; // 60 * 60 = 3600
        }
        if (self.minute) |m| {
            seconds += @as(i64, m) * 60;
        }
        if (self.second) |s| {
            seconds += s;
        }
        return seconds
    }

    pub fn toTimestamp(self: Self) i64 {
        var seconds = self.toTimestampNaive();

        // if we have an offset apply it here
        if (self.getOffset()) |ofs| seconds -= ofs.asSeconds();
        return seconds;
    }

NOTE: The toTimestampNaive function is going to be used later on when we need to calculate the day of the week. If we were to do this calculation after taking the TimeZone into account, then the result would be accurate for UTC but possibly inaccurate for Local time.

For our conversion going the other direction, I'm going to skip the part which deals with negative integers as those timestamps are all well in the past. Just know that for a real library it is something that would have to be dealt with, because we can't assume we're never going to see a date before 1970. We might.

This op is the reverse, so instead of starting with a var seconds = 0 we're going to start with the timestamp itself, then whittle it down until we get to our final DateTime representation.

    pub fn fromTimestamp(ts: i64) Self {
        if (ts < 0) {
            // skipped for brevity
        } else if (ts > 0) {
            var seconds = ts;
            // We start at 1970-01-01 00:00:00 and count forward
            var year = Year.new(1970);

            // Until we reach a point where the number of seconds left is less
            // than what is in the year, we take a year's worth of seconds off
            // and advance through the years
            while (year.seconds() < seconds) {
                seconds -= year.seconds();
                year = year.next();
            }

            // Do the same with months as we did with years, but we can start
            // with January
            var month = Month.january;
            while (month.seconds(year) < seconds) {
                seconds -= month.seconds(year);
                month = month.next().?;
            }

            // In Zig, we actually care about the difference between remainder
            // and modulo, so we're using `@divTrunc` and `@rem` compiler builtins
            // to deal with the fact that we are technically doing math with signed
            // integers, even though this code branch is all positive int's.
            const day = @divTrunc(seconds, SECONDS_PER_DAY) + 1;
            seconds = @rem(seconds, SECONDS_PER_DAY);
            const hour = @divTrunc(seconds, SECONDS_PER_HOUR);
            seconds = @rem(seconds, SECONDS_PER_HOUR);
            const minute = @divTrunc(seconds, 60);
            seconds = @rem(seconds, 60);
            return Self{
                .year = year,
                .month = month,
                .day = @intCast(u8, day),
                .hour = @intCast(u5, hour),
                .minute = @intCast(u6, minute),
                .second = @intCast(u6, seconds),
                .tz = .utc,
            };
        } else {
            // If the timestamp is not less than zero or greater than zero, then
            // by definition that means it's zero. So we return the Unix epoch.
            return Self{
                .year = Year.new(0),
                .month = .january,
                .day = 1,
                .hour = 0,
                .minute = 0,
                .second = 0,
                .tz = .utc,
            };
        }
    }

Wrapping up

There are some other things that would be nice to have in a little code library such as this. We want to be able to compare DateTime instances, and we want to be able to sent the entire struct into a format string and get something nice and human readable back. We might want to be able to create an instant using now(). And of course, we haven't dealt with days of the week yet.

For weekdays we're just going to create an enum. Actually getting the weekday is pretty easy since we know what day of the week the epoch fell on (Thursday). We just take our constant, SECONDS_PER_DAY, which is 60 * 60 * 24, and we divide the full timestamp by that number, which gives us the total days since the epoch. We then get the remainder of that number divided by 7.

pub const WeekDay = enum(u3) {
    thursday = 0,
    friday,
    saturday,
    sunday,
    monday,
    tuesday,
    wednesday,
};

// further down, in the `DateTime` struct definition

    pub fn weekday(self: Self) WeekDay {
        // First convert to timestamp
        const ts = self.toTimestampNaive();

        // Now get the number of days, ignoring any remainder
        const days = @divTrunc(ts, SECONDS_PER_DAY);

        // By taking the remainder here, we'll get a number from 0-6, which we
        // can then convert to a `Weekday` using the `@intToEnum` Zig compiler
        // builtin.
        return @intToEnum(WeekDay, @rem(days, 7));
    }

As for comparison, Zig does not have operator overloading, so we're never going to be able to take two DateTime structs and compare them using <, > or = operators. But we can do ordering and simulate the comparison operators a bit.

pub const Comparison = enum {
    gt,
    lt,
    eq,
};

// further down, we want to compare two `DateTime` structs

    pub fn compare(self: Self, other: Self) Comparison {
        const a = self.toTimestamp();
        const b = other.toTimestamp();
        return if (a > b) .gt else if (a < b) .lt else .eq;
    }

Note that since we apply any timezone offsets as part of the conversion to i64, we can just do a straight comparison here. A sorting algorithm for a collection of DateTime structs could be built up from this function. I'm leaving that for now as outside of the scope of this article.

For a now function, we can just call the Zig std.time.timestamp() function and parse that into a DateTime struct using our conversion method from earlier.

    pub fn now() Self {
        return Self.fromTimestamp(std.time.timestamp());
    }

And finally, pretty printing in Zig format strings. Now, since we've already implemented the format function on some of our subtypes, we get to re-use that here.

    pub fn format(
        self: Self,
        comptime fmt: []const u8,
        options: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        _ = fmt;
        _ = options;

        try writer.print("{s}-{d:0>2}-{d:0>2}", .{
            self.year, @enumToInt(self.month), self.day,
        });
        if (self.hour) |h| {
            try writer.print("T{d:0>2}", .{h});
            if (self.minute) |m| {
                try writer.print(":{d:0>2}", .{m});
                if (self.second) |s| {
                    try writer.print(":{d:0>2}", .{s});
                }
            }
        }
        try writer.print("{s}", .{self.tz});
    }

What else?

If you look closely at the format function you'll notice that it's output is in ISO8601 format (YYYY-MM-DDTHH:MM:SSZ). One could expand on this to provide formatting functions for all of the other various standards for time representation.

Something else that I left completely out of scope for this article was parsing a string into a DateTime struct. This is something that becomes a bit of a rats nest when you consider all of the different date and time formats that are in common use. There's disagreement about whether the day or the month should be placed first, depending on what country you were born in. Some businesses use weeks rather than months, which gets complicated by the fact that the first of the year falls on a different weekdays each year. There's ordinal dates, where we leave out months and weeks entirely and just count days. And there are a lot of different ways to represent time zones. Probably the best advice would be to limit the scope of any given library to a small subset of the different schemes in common use, lest you blow someone's binary size up.

Code for this article

The full code for this article sits on my Gitea instance here.