Time
No, not that Pink Floyd song off Dark Side (although that's a great song). I'm planning to smack down some Unix and computer time knowledge in today's post. Let's keep it on the practical side of things and we'll try not to stray too far into the metaphysical.
It's more complex than you might think
It's pretty common as a programmer to have to deal with time, and if you have then you know that it's a much more complex concept than it at first appears. If you haven't strayed into an area where you're dealing with the representation of time, then you've probably never given it much thought.
When you look closely at our current system of timekeeping and examine it as a system, without thinking in terms of the natural world, you start to notice that nothing lines up into nice little buckets for the poor programmer. Sure, it all makes perfect sense when displayed against the framework of the natural world, but take that away and it just looks batshit crazy. Let's break things down into units of time and how they're measured.
- seconds are divided into decimal bits
- minutes are 60 seconds
- hours are 60 minutes
- days are 24 hours (most of the time, ignoring leap seconds for now)
- weeks are 7 days
- months are anywhere from 28-31 days, depending on which month of the year it is and whether it's a leap year or not
- Years have 12 months and either 365 or 366 days
- A leap year is a year that is divisible by 4, but not by 100, unless it is also divisible by 400
- Every 15 degrees around the globe we have a timmezone, which is generally one hour off from its neighbor in local time, except for those places which use daylight savings time during the summer, and the middle of the pacific where you might travel accross a magical boundary line one day into the past.
There aren't a lot of decimals or binaries in all of that, and there are some interesting rules to follow. It kind of leaves one in awe of the human brain to think that our mushy grey matter can intuitively grasp such an inconsistent standard, which to a computer must look like someone has been spiking the punch at a junior high prom.
One might, at first glance, think that dealing with timezones (as an example) is not really all that hard. Well, what if it's March 1st at 2AM and you're living in UTC+5? What does that translate to in UTC time? That actually depends on what year it is.
Enter the Unix epoch
The Unix epoch is nothing more than an agreed upon moment, coinciding with the date and time of January 1st, 1970 at 00::00:00 AM UTC. In Unix we use timestamps to track file creation and modification times, which are a large signed integer which counts seconds from the Unix epoch. This greatly simplifies timekeeping on Unix systems, as we just have to count seconds. Need to order a bunch of blog posts in an aggregator by time of publication? Easiest way to do it is to convert to an i64 referenced to the epoch. So rather than a complex mathematical system to compare date-times all we really need is a way to convert from a human readable time representation to an i64 referenced to the epoch, and back again.
What about Leap seconds?
That's a good question. For the unaware, our current leap year system isn't wholly adequate to keep our clocks in sync with the cosmos and planetary movement, so we occasionally add in a leap second to fix the discrepancy. But on Unix systems we fudge it a bit. Rather than having a day with 60 * 60 * 24 seconds plus 1 leap second we generally just count the last second in the day twice. This does mean, however, that on those days there is a very ambiguous window between 11:59:59 and 00:00:00, where the timestamps cannot be considered an accurate representation for ordering purposes.
How does this conversion work, anyway?
I'm glad that you asked. I've implemented datetime containers in Rust a few times now, but I'm a little tired of Rust and want to work in Zig for a bit. While Zig doesn't have all of the gaurantees regarding memory safety that Rust has, both languages benefit from having very strong typing including algabreic data types, which goes a surprisingly long way towards turning runtime errors into compile time errors. In fact, if you make a habit of touching the heap as little as possible I think Zig comes damn near Rust in terms of enforcing safety and correctness. So let's indulge for a moment and take a peak at a datetime container in Zig, with methods to convert to and from i64 timestamps. We're going to start with this year nonsense and work down from there to months and days.
const std = @import("std"); const debug = std.debug; const testing = std.testing; const SECONDS_PER_DAY = @import("main.zig").SECONDS_PER_DAY; pub const YearTag = enum(u1) { normal, leap, fn new(year: i32) YearTag { return if (@rem(year, 4) == 0 and (@rem(year, 100) != 0 or @rem(year, 400) == 0)) .leap else .normal; } };
This bit of code is just an enum which tells us whether we're in a leap
year or not. The new function takes in i32, the year, and
returns either .normal or .leap depending on
the math. We're going to use this enum as a tag to create our
Year type, which is a tagged union, so that whether or not
a given year is a leap year or not is encoded in the type. As a quick
aside before looking at that, I want to point out that
YearTag is an enum(u1), meaning that it is
represented by a one bit integer. Combined with other
language constructs such as packed structs, Zig allows you to pack the
maximum amount of data into the minimum memory footprint. Aside over,
let's look at our Year union.
pub const Year = union(YearTag) { normal: i32, leap: i32, const Self = @This(); pub fn new(year: i32) Self { return switch (YearTag.new(year)) { .normal => Self{ .normal = year }, .leap => Self{ .leap = year }, }; } }
Since we want to break some of this math down into bite sized chunks,
we're going to use our Year tag to give us a function
which returns the number of days in any given year. Since we want to be
able to convert to seconds, it's also useful to provide a function
which gives to total number of seconds in a given year.
pub fn days(self: Self) u16 { return switch (self) { .normal => 365, .leap => 366, }; } pub fn seconds(self: Self) i64 { return @as(i64, self.days()) * SECONDS_PER_DAY; }
We're going to round out our little year module with a function which get's the number portion of our data type, functions which give us the next or previous year, and a function which pretty-prints the year for us which we can use in Zig's format strings.
pub fn get(self: Self) i32 { return switch (self) { .normal => |year| year, .leap => |year| year, }; } pub fn next(self: Self) Self { return Self.new(self.get() + 1); } pub fn previous(self: Self) Self { return Self.new(self.get() - 1); } pub fn format( self: Self, comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype, ) !void { _ = fmt; _ = options; const year = self.get(); if (year > 0) { try writer.print("{d:0>4}", .{@intCast(u32, year)}); } else { try writer.print("{d:0>4}", .{year}); } } };
I want to point out that you could very well use an i32 by itself to represent the year in your datetime container. Doing it this way, by leveraging a strong type system, makes it harder to represent gibberish data. It also takes care of some of the math that we'll be dealing with later, which is going to shorten our conversion functions considerably. Let's move on then, to months.
Months
We have twelve months on our calendar, and we want to represent them in
a way that is human readable while also making sense to the processor.
We also want to move some of our logic into our month module so that we
won't have to deal with it later, just like we did with our year
module. The type I'm going to reach for this time is just an enum, not
a tagged union, because we don't have an extra numerical component to
represent. So, like our Year union, I'm providing methods
to return the number of days, number of seconds, next and previous
months.
pub const Month = enum(u4) { january = 1, february = 2, march = 3, april = 4, may = 5, june = 6, july = 7, august = 8, september = 9, october = 10, november = 11, december = 12, const Self = @This(); pub fn days(self: Self, year: Year) u5 { return switch (@enumToInt(self)) { 1, 3, 5, 7, 8, 10, 12 => 31, 2 => switch (year) { .normal => 28, .leap => 29, }, else => 30, }; } pub fn seconds(self: Self, year: Year) u32 { return @as(u32, self.days(year)) * SECONDS_PER_DAY; } pub fn next(self: Self) ?Self { const num = @enumToInt(self); return if (num < 12) @intToEnum(Self, num + 1) else null; } pub fn previous(self: Self) ?Self { const num = @enumToInt(self); return if (num > 1) @intToEnum(Self, num - 1) else null; } };
Now, we're almost ready for our actual DateTime container,
but it's probably a good idea to give some thought to how we're going
to handle time zones first.
TimeZones in Zig
Let's lay some groundwork. Our TimeZone is going to be expressed as
either UTC, or as a positive or a negative offset from UTC. Whenever
something can be one of several different things that says "I'm an
enum!", and since we have tagged unions in Zig we can then encode the
data and it's meaning (the tags) right into the types. Notice that the
HoursMinutes struct is not marked pub? We don't really
want or need it cluttering up the public API.
pub const TimeZoneTag = enum(u1) { utc, offset, }; pub const Sign = enum(u1) { positive, negative, }; const HoursMinutes = struct { hours: u4, minutes: ?u6, }; pub const Offset = union(Sign) { positive: HoursMinutes, negative: HoursMinutes, };
That's pretty nice already, but we can add some logic into
Offset which will again shorten our conversion functions
later.
pub const Offset = union(Sign) { positive: HoursMinutes, negative: HoursMinutes, const Self = @This(); pub fn new(hours: i8, minutes: ?u6) ?Self { if (hours > 12 or hours < -12) { return null; } if (minutes) |m| { if (m > 59) return null; if (hours == 0 and m == 0) return null; } else if (hours == 0) return null; if (hours < 0) { const h = @intCast(u4, @as(i8, hours) * -1); return Self{ .negative = .{ .hours = h, .minutes = minutes } }; } else { return Self{ .positive = .{ .hours = @intCast(u4, hours), .minutes = minutes } }; } } pub fn asSeconds(self: Self) i64 { return switch (self) { .positive => |ofs| blk: { var seconds = @as(i64, ofs.hours) * 3600; if (ofs.minutes) |m| seconds += (@as(i64, m) * 60); break :blk seconds; }, .negative => |ofs| blk: { var seconds = @as(i64, ofs.hours) * 3600; if (ofs.minutes) |m| seconds += (@as(i64, m) * 60); break :blk seconds * -1; }, }; } };
So now with the groundwork layed, we can go from those smaller types to
the actual TimeZone container, a tagged union which is
either .utc or .offset, where the
.offset variant contains it's data.
pub const TimeZone = union(TimeZoneTag) { utc: void, offset: Offset, const Self = @This(); pub fn new(hours: ?i8, minutes: ?u6) ?Self { return if (hours) |h| blk: { if (h == 0) { break :blk .utc; } else if (Offset.new(h, minutes)) |ofs| { break :blk Self{ .offset = ofs }; } else { break :blk null; } } else if (minutes) |m| Self{ .offset = Offset.new(0, m).? } else .utc; } pub fn format( self: Self, comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype, ) !void { _ = fmt; _ = options; switch (self) { .utc => try writer.writeAll("Z"), .offset => |ofs| switch (ofs) { .positive => |p| { try writer.print("+{d:0>2}", .{p.hours}); if (p.minutes) |m| { try writer.print(":{d:0>2}", .{m}); } }, .negative => |n| { try writer.print("-{d:0>2}", .{n.hours}); if (n.minutes) |m| { try writer.print(":{d:0>2}", .{m}); } }, }, } } };
Now with all of these smaller container types, which encode a lot of
our program logic including bounds (aren't strongly typed languages
great?) it's time to go for it and put it all together with days,
hours, minutes and seconds and create our DateTime
container.
DateTime in Zig
pub const DateTime = struct { year: Year, month: Month, day: u8, hour: ?u5, minute: ?u6, second: ?u6, tz: TimeZone, };
So yeah, we're actually using all of those types here. The previous
work wasn't entirely in vain, and we now have a DateTime
representation. Note that we're allowing for some variable precision
here by making hours, minutes and seconds optionals. But wait, didn't I
say that there were going to be conversions to an from i64? Oh yes, I
did all right. Let's tackle converting to a timestamp first.
We're going to keep a running total of seconds and work through years,
months, days, hours etc. The comments in the code should guide you
through if it's not already clear.
fn toTimestampNaive(self: Self) i64 { var seconds: i64 = 0; // If the year is before 1970, we have to work // backwards. For simplicity, we'll count *past* the date // to the beginning of the year, then work forwards again. if (self.year.get() < 1970) { var year = Year.new(1970); while (year.get() != self.year.get()) { // get the previous year and subtract the total seconds in it year = year.previous(); seconds -= year.seconds(); } } else if (self.year.get() > 1970) { var year = Year.new(1970); // get the next year and add all of it's seconds while (year.get() != self.year.get()) { seconds += year.seconds(); year = year.next(); } } // Until we get to the current month, we'll advance through each // month starting with January and add all of it's seconds var month = Month.january; while (month != self.month) { seconds += month.seconds(self.year); month = month.next().?; } // The days begin numbering with 1, so on the 5th we have had four full // days plus some remainder. So we take self.days - 1 for our calculation seconds += @as(i64, self.day - 1) * SECONDS_PER_DAY; if (self.hour) |h| { seconds += @as(i64, h) * 3600; // 60 * 60 = 3600 } if (self.minute) |m| { seconds += @as(i64, m) * 60; } if (self.second) |s| { seconds += s; } return seconds } pub fn toTimestamp(self: Self) i64 { var seconds = self.toTimestampNaive(); // if we have an offset apply it here if (self.getOffset()) |ofs| seconds -= ofs.asSeconds(); return seconds; }
NOTE: The
toTimestampNaivefunction is going to be used later on when we need to calculate the day of the week. If we were to do this calculation after taking the TimeZone into account, then the result would be accurate for UTC but possibly inaccurate for Local time.
For our conversion going the other direction, I'm going to skip the part which deals with negative integers as those timestamps are all well in the past. Just know that for a real library it is something that would have to be dealt with, because we can't assume we're never going to see a date before 1970. We might.
This op is the reverse, so instead of starting with a var seconds
= 0 we're going to start with the timestamp itself, then whittle
it down until we get to our final DateTime representation.
pub fn fromTimestamp(ts: i64) Self { if (ts < 0) { // skipped for brevity } else if (ts > 0) { var seconds = ts; // We start at 1970-01-01 00:00:00 and count forward var year = Year.new(1970); // Until we reach a point where the number of seconds left is less // than what is in the year, we take a year's worth of seconds off // and advance through the years while (year.seconds() < seconds) { seconds -= year.seconds(); year = year.next(); } // Do the same with months as we did with years, but we can start // with January var month = Month.january; while (month.seconds(year) < seconds) { seconds -= month.seconds(year); month = month.next().?; } // In Zig, we actually care about the difference between remainder // and modulo, so we're using `@divTrunc` and `@rem` compiler builtins // to deal with the fact that we are technically doing math with signed // integers, even though this code branch is all positive int's. const day = @divTrunc(seconds, SECONDS_PER_DAY) + 1; seconds = @rem(seconds, SECONDS_PER_DAY); const hour = @divTrunc(seconds, SECONDS_PER_HOUR); seconds = @rem(seconds, SECONDS_PER_HOUR); const minute = @divTrunc(seconds, 60); seconds = @rem(seconds, 60); return Self{ .year = year, .month = month, .day = @intCast(u8, day), .hour = @intCast(u5, hour), .minute = @intCast(u6, minute), .second = @intCast(u6, seconds), .tz = .utc, }; } else { // If the timestamp is not less than zero or greater than zero, then // by definition that means it's zero. So we return the Unix epoch. return Self{ .year = Year.new(0), .month = .january, .day = 1, .hour = 0, .minute = 0, .second = 0, .tz = .utc, }; } }
Wrapping up
There are some other things that would be nice to have in a little code
library such as this. We want to be able to compare
DateTime instances, and we want to be able to sent the
entire struct into a format string and get something nice and human
readable back. We might want to be able to create an instant using
now(). And of course, we haven't dealt with days of the
week yet.
For weekdays we're just going to create an enum. Actually getting the
weekday is pretty easy since we know what day of the week the epoch
fell on (Thursday). We just take our constant,
SECONDS_PER_DAY, which is 60 * 60 * 24, and we divide the
full timestamp by that number, which gives us the total days since the
epoch. We then get the remainder of that number divided by 7.
pub const WeekDay = enum(u3) { thursday = 0, friday, saturday, sunday, monday, tuesday, wednesday, }; // further down, in the `DateTime` struct definition pub fn weekday(self: Self) WeekDay { // First convert to timestamp const ts = self.toTimestampNaive(); // Now get the number of days, ignoring any remainder const days = @divTrunc(ts, SECONDS_PER_DAY); // By taking the remainder here, we'll get a number from 0-6, which we // can then convert to a `Weekday` using the `@intToEnum` Zig compiler // builtin. return @intToEnum(WeekDay, @rem(days, 7)); }
As for comparison, Zig does not have operator overloading, so we're
never going to be able to take two DateTime structs and
compare them using <, > or
= operators. But we can do ordering and simulate the
comparison operators a bit.
pub const Comparison = enum { gt, lt, eq, }; // further down, we want to compare two `DateTime` structs pub fn compare(self: Self, other: Self) Comparison { const a = self.toTimestamp(); const b = other.toTimestamp(); return if (a > b) .gt else if (a < b) .lt else .eq; }
Note that since we apply any timezone offsets as part of the conversion
to i64, we can just do a straight comparison here. A sorting algorithm
for a collection of DateTime structs could be built up
from this function. I'm leaving that for now as outside of the scope of
this article.
For a now function, we can just call the Zig
std.time.timestamp() function and parse that into a
DateTime struct using our conversion method from earlier.
pub fn now() Self { return Self.fromTimestamp(std.time.timestamp()); }
And finally, pretty printing in Zig format strings. Now, since we've
already implemented the format function on some of our
subtypes, we get to re-use that here.
pub fn format( self: Self, comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype, ) !void { _ = fmt; _ = options; try writer.print("{s}-{d:0>2}-{d:0>2}", .{ self.year, @enumToInt(self.month), self.day, }); if (self.hour) |h| { try writer.print("T{d:0>2}", .{h}); if (self.minute) |m| { try writer.print(":{d:0>2}", .{m}); if (self.second) |s| { try writer.print(":{d:0>2}", .{s}); } } } try writer.print("{s}", .{self.tz}); }
What else?
If you look closely at the format function you'll notice
that it's output is in ISO8601 format (YYYY-MM-DDTHH:MM:SSZ). One could
expand on this to provide formatting functions for all of the other
various standards for time representation.
Something else that I left completely out of scope for this article was
parsing a string into a DateTime struct. This is something
that becomes a bit of a rats nest when you consider all of the
different date and time formats that are in common use. There's
disagreement about whether the day or the month should be placed first,
depending on what country you were born in. Some businesses use weeks
rather than months, which gets complicated by the fact that the first
of the year falls on a different weekdays each year. There's ordinal
dates, where we leave out months and weeks entirely and just count
days. And there are a lot of different ways to represent time
zones. Probably the best advice would be to limit the scope of any
given library to a small subset of the different schemes in common use,
lest you blow someone's binary size up.
Code for this article
The full code for this article sits on my Gitea instance here.
Tags for this post: Programming Zig Unix