Data Structures as Code: The Joys of Meta-Programming

While working on a major rewrite of the Datetime Julia library for Dates and DateTimes, increased performance was a key goal. And though the original implementation was already fast, Julia developers tend to be greedy, so naturally I wanted more.

One of the known bottlenecks at the time, was how leap seconds were being dealt with. The DateTime type accounted for leap seconds like any other second, and naturally this complicated the internals a bit. See, the internal type is just stored as a machine instant, a monotonically increasing number from zero.

julia> dt = datetime(2014,6,19)  
2014-06-19T00:00:00 UTC

julia> int(dt)  

So the tricky part, comes in the algorithm to calculate this machine instant given the parts of a date (year, month, day, etc.). In a leap-seconds-don't-exist kind of world, this is trivial:

totalms = ms + 1000*(s + 60mi + 3600h + 86400*totaldays(y,m,d))  

The hard part, is when trying to convert the above into leap-seconds-do-exist world.

Without going into too much detail, the solution basically involves using two cached arrays, one in the leap second timeline specifying the instant a leap second occurred; while the other is on a non-leap second timeline specifying the instant right before a leap second would occur (yes, it totally makes me think of alternate timelines).

The bottleneck problem boils down to the fact that every time you create or show a DateTime, you have to do a lookup into these leap second arrays to figure out how many leap seconds need to be corrected in either setting the machine instant, or displaying the right date parts.

The natural thing here is to employ binary search, which Base Julia provides an excellent implmentation of in the searchsorted family of functions. Basically, we calculate our totalms from above, then do binary search to figure out when the previous leap second instant occured and how many leap seconds it represents and need to be corrected for our calculated instant (in this case, we could use the searchsortedlast function to get the last leap second instant before our calculated instant). Simple enough, right? End of story? Move on?

But let's step back a minute and think about the nature of the problem. A major insight is the fact that our leap second arrays are static--nothing is being added or removed apart from library releases and when the IERS decides things are getting too out of whack with earth's rotation. So how could we use this knowledge to improve performance? Well, if we take our searchsortedlast binary search to the extreme, it would ideally look like:

const LEAPSECONDS = [...]  
function setleaps(totalms)  
    if totalms < 62624707200000  # our middle value of LEAPSECONDS
        if totalms < 62388144000000 # cut 1st half in half again
            if totalms < 62230377600000 # last cut needed
                return 1  # only one leap second has occured
                return 2  # 2 leap seconds

Basically, a hand-written searchsortedlast binary search tree using our static array values that elimates all recursive/function call overhead and is close to the metal. But who wants to write all that out by hand? And when our leap second arrays do get updated, we'll have to manually rebalance our tree, and hand-write it all over again.....yuck. But wait, I seem to recall a famous PR by Steven Johnson where he used a macro for hand-unrolling array coefficients that other libraries wouldn't dream of trying to maintain. Perhaps we too can leverage Julia's meta-programming capabilities to do our dirty work.

# Recursively build binary search tree w/ known lookup values
# A is sorted array of lookup values
# R is return values for each index of A
# i.e. R[1] is returned for values < A[1], R[2] for < A[2], etc.
function searchsortedlasttree(A,R,var_name)  
    l = length(A)
    mid = iseven(l) ? l>>>1 : (l>>>1)+1
    # Handle base case
    if mid == 1
        if l == 1
            return :($var_name < $(A[1]) ? $(R[1]) : $(R[2]))
        else # l == 2
            return :($var_name < $(A[1]) ? $(R[1]) : 
                     $var_name < $(A[2]) ? $(R[2]) : $(R[3]))
    iftree = Expr(:if)
    iftree.args = Array(Any,3)
    iftree.args[1] = :($var_name < $(A[mid])) # condition
    iftree.args[2] = searchsortedlasttree(A[1:mid-1],R[1:mid],var_name)
    iftree.args[3] = searchsortedlasttree(A[mid+1:end],R[mid+1:end],var_name)
    return iftree

macro leapstree(a,r,var_name)  
    # var_name is the variable name that will be passed in thru the parent function
    A = eval(a)  # safe to eval here because 'a' is known at compile time
    R = eval(r)  # same here
    ret = Expr(:block)  # we'll store everything in a big block expression
    # First we manually handle before and after the ends of our leaps array
    push!(ret.args,:($var_name < $(A[1]) && return 0))
    push!(ret.args,:($var_name >= $(A[end]) && return $(endof(A)*1000)))
    # Now we call the recursive searchsortedlasttree to get the rest
    return ret

const SETLEAPS = [62214480000000,62230377600000,62261913600000, 62293449600000,62324985600000,62356608000000,62388144000000, 62419680000000,62451216000000,62498476800000,62530012800000, 62561548800000,62624707200000,62703676800000,62766835200000, 62798371200000,62845632000000,62877168000000,62908704000000, 62956137600000,63003398400000,63050832000000,63271756800000, 63366451200000,63476784000000]

function setleaps(ms)  
    # @leapstree as a macro ensures everything gets expanded
    # at compile time

I tried to add useful comments so you can follow what's going on above, but basically we have our setleaps function that will determine how many leap seconds need to be corrected given a totalms value. setleaps is built at compile time by expanding the macro leapstree which manually builds a binary search tree given the SETLEAPS array of leap second instants and the return values for each slot. The binary search tree is built primarily by searchsortedlasttree, which is similar to the searchsortedlast implementation, except instead of returning values, it returns expressions. It is also built recursively to handle any depth of tree.

Hmmm....seems like a bit of hassle, does it really improve things much? Well, for one thing, the above code is robust, I've written it once and don't have to do anything more except tack on the next leap second to SETLEAPS the next time one is announced. Our manually built binary search tree will be rebalanced automatically. What about performance?

julia> @time [searchsortedlast(SETLEAPS, 63366451200000) for i = 1:10000000];  
elapsed time: 0.50351136 seconds (80000048 bytes allocated)

julia> @time [setleaps(63366451200000) for i = 1:10000000];  
elapsed time: 0.067922885 seconds (80000048 bytes allocated)  

Boom! Almost 10x!

Anyway, I thought this was an interesting application of meta-programming in Julia where we're basically taking static data and generating an operation on a data structure in the code itself. A recent thread in the Julia-Users forum asked about the long-term hopes of Julia achieving true C/Fortran performance (currently usually 1.2-2x). I think another strong point to the argument of Julia vs. C/Fortran is what I've shown above. Powerful language design choices like strong meta-programming functionality allows you to do things in Julia that would be almost impossible in many other languages, but that also allow for unique performance gains.

Stay tuned for more adventures in Julia land and maybe for my next post, I'll talk a little more about Julia's code introspection tools and how they can be so darn handy while developing. As a teaser, here's the generated code for our setleaps method above:

In  [62]: @code_lowered setleaps(62561548800000)

Out [62]: 1-element Array{Any,1}:  
 :($(Expr(:lambda, {:ms}, {{},{{:ms,:Any,0}},{}}, :(begin  # In[30], line 43:
        unless ms < 62214480000000 goto 0
        return 0
        unless ms >= 63476784000000 goto 1
        return 25000
        unless ms < 62624707200000 goto 15
        unless ms < 62388144000000 goto 8
        unless ms < 62293449600000 goto 4
        unless ms < 62230377600000 goto 2
        return 1000
        unless ms < 62261913600000 goto 3
        return 2000
        return 3000
        goto 7
        unless ms < 62324985600000 goto 5
        return 4000
        unless ms < 62356608000000 goto 6
        return 5000
        return 6000
        goto 14
        unless ms < 62498476800000 goto 11
        unless ms < 62419680000000 goto 9
        return 7000
        unless ms < 62451216000000 goto 10
        return 8000
        return 9000
        goto 14
        unless ms < 62530012800000 goto 12
        return 10000
        unless ms < 62561548800000 goto 13
        return 11000
        return 12000
        goto 28
        unless ms < 62908704000000 goto 22
        unless ms < 62798371200000 goto 18
        unless ms < 62703676800000 goto 16
        return 13000
        unless ms < 62766835200000 goto 17
        return 14000
        return 15000
        goto 21
        unless ms < 62845632000000 goto 19
        return 16000
        unless ms < 62877168000000 goto 20
        return 17000
        return 18000
        goto 28
        unless ms < 63050832000000 goto 25
        unless ms < 62956137600000 goto 23
        return 19000
        unless ms < 63003398400000 goto 24
        return 20000
        return 21000
        goto 28
        unless ms < 63271756800000 goto 26
        return 22000
        unless ms < 63366451200000 goto 27
        return 23000
        return 24000