Data Structures as Code: The Joys of Meta-Programming
While working on a major rewrite of the Datetime Julia library for Dates and DateTimes, increased performance was a key goal. And though the original implementation was already fast, Julia developers tend to be greedy, so naturally I wanted more.
One of the known bottlenecks at the time, was how leap seconds were being dealt with. The DateTime
type accounted for leap seconds like any other second, and naturally this complicated the internals a bit. See, the internal type is just stored as a machine instant, a monotonically increasing number from zero.
julia> dt = datetime(2014,6,19)
2014-06-19T00:00:00 UTC
julia> int(dt)
63538819225000
So the tricky part, comes in the algorithm to calculate this machine instant given the parts of a date (year, month, day, etc.). In a leap-seconds-don't-exist kind of world, this is trivial:
totalms = ms + 1000*(s + 60mi + 3600h + 86400*totaldays(y,m,d))
The hard part, is when trying to convert the above into leap-seconds-do-exist world.
Without going into too much detail, the solution basically involves using two cached arrays, one in the leap second timeline specifying the instant a leap second occurred; while the other is on a non-leap second timeline specifying the instant right before a leap second would occur (yes, it totally makes me think of alternate timelines).
The bottleneck problem boils down to the fact that every time you create or show a DateTime
, you have to do a lookup into these leap second arrays to figure out how many leap seconds need to be corrected in either setting the machine instant, or displaying the right date parts.
The natural thing here is to employ binary search, which Base Julia provides an excellent implmentation of in the searchsorted family of functions. Basically, we calculate our totalms
from above, then do binary search to figure out when the previous leap second instant occured and how many leap seconds it represents and need to be corrected for our calculated instant (in this case, we could use the searchsortedlast
function to get the last leap second instant before our calculated instant). Simple enough, right? End of story? Move on?
But let's step back a minute and think about the nature of the problem. A major insight is the fact that our leap second arrays are static--nothing is being added or removed apart from library releases and when the IERS decides things are getting too out of whack with earth's rotation. So how could we use this knowledge to improve performance? Well, if we take our searchsortedlast
binary search to the extreme, it would ideally look like:
const LEAPSECONDS = [...]
function setleaps(totalms)
if totalms < 62624707200000 # our middle value of LEAPSECONDS
if totalms < 62388144000000 # cut 1st half in half again
if totalms < 62230377600000 # last cut needed
return 1 # only one leap second has occured
else
return 2 # 2 leap seconds
end
else
...
else
...
end
Basically, a hand-written searchsortedlast
binary search tree using our static array values that elimates all recursive/function call overhead and is close to the metal. But who wants to write all that out by hand? And when our leap second arrays do get updated, we'll have to manually rebalance our tree, and hand-write it all over again.....yuck. But wait, I seem to recall a famous PR by Steven Johnson where he used a macro for hand-unrolling array coefficients that other libraries wouldn't dream of trying to maintain. Perhaps we too can leverage Julia's meta-programming capabilities to do our dirty work.
# Recursively build binary search tree w/ known lookup values
# A is sorted array of lookup values
# R is return values for each index of A
# i.e. R[1] is returned for values < A[1], R[2] for < A[2], etc.
function searchsortedlasttree(A,R,var_name)
l = length(A)
mid = iseven(l) ? l>>>1 : (l>>>1)+1
# Handle base case
if mid == 1
if l == 1
return :($var_name < $(A[1]) ? $(R[1]) : $(R[2]))
else # l == 2
return :($var_name < $(A[1]) ? $(R[1]) :
$var_name < $(A[2]) ? $(R[2]) : $(R[3]))
end
end
iftree = Expr(:if)
iftree.args = Array(Any,3)
iftree.args[1] = :($var_name < $(A[mid])) # condition
iftree.args[2] = searchsortedlasttree(A[1:mid-1],R[1:mid],var_name)
iftree.args[3] = searchsortedlasttree(A[mid+1:end],R[mid+1:end],var_name)
return iftree
end
macro leapstree(a,r,var_name)
# var_name is the variable name that will be passed in thru the parent function
A = eval(a) # safe to eval here because 'a' is known at compile time
R = eval(r) # same here
ret = Expr(:block) # we'll store everything in a big block expression
# First we manually handle before and after the ends of our leaps array
push!(ret.args,:($var_name < $(A[1]) && return 0))
push!(ret.args,:($var_name >= $(A[end]) && return $(endof(A)*1000)))
# Now we call the recursive searchsortedlasttree to get the rest
push!(ret.args,searchsortedlasttree(A[2:(endof(A)-1)],R,var_name))
return ret
end
const SETLEAPS = [62214480000000,62230377600000,62261913600000, 62293449600000,62324985600000,62356608000000,62388144000000, 62419680000000,62451216000000,62498476800000,62530012800000, 62561548800000,62624707200000,62703676800000,62766835200000, 62798371200000,62845632000000,62877168000000,62908704000000, 62956137600000,63003398400000,63050832000000,63271756800000, 63366451200000,63476784000000]
function setleaps(ms)
# @leapstree as a macro ensures everything gets expanded
# at compile time
@leapstree(SETLEAPS,[1000:1000:((endof(SETLEAPS)-1)*1000)],ms)
end
I tried to add useful comments so you can follow what's going on above, but basically we have our setleaps
function that will determine how many leap seconds need to be corrected given a totalms
value. setleaps
is built at compile time by expanding the macro leapstree
which manually builds a binary search tree given the SETLEAPS
array of leap second instants and the return values for each slot. The binary search tree is built primarily by searchsortedlasttree
, which is similar to the searchsortedlast implementation, except instead of returning values, it returns expressions. It is also built recursively to handle any depth of tree.
Hmmm....seems like a bit of hassle, does it really improve things much? Well, for one thing, the above code is robust, I've written it once and don't have to do anything more except tack on the next leap second to SETLEAPS
the next time one is announced. Our manually built binary search tree will be rebalanced automatically. What about performance?
julia> @time [searchsortedlast(SETLEAPS, 63366451200000) for i = 1:10000000];
elapsed time: 0.50351136 seconds (80000048 bytes allocated)
julia> @time [setleaps(63366451200000) for i = 1:10000000];
elapsed time: 0.067922885 seconds (80000048 bytes allocated)
Boom! Almost 10x!
Anyway, I thought this was an interesting application of meta-programming in Julia where we're basically taking static data and generating an operation on a data structure in the code itself. A recent thread in the Julia-Users forum asked about the long-term hopes of Julia achieving true C/Fortran performance (currently usually 1.2-2x). I think another strong point to the argument of Julia vs. C/Fortran is what I've shown above. Powerful language design choices like strong meta-programming functionality allows you to do things in Julia that would be almost impossible in many other languages, but that also allow for unique performance gains.
Stay tuned for more adventures in Julia land and maybe for my next post, I'll talk a little more about Julia's code introspection tools and how they can be so darn handy while developing. As a teaser, here's the generated code for our setleaps
method above:
In [62]: @code_lowered setleaps(62561548800000)
Out [62]: 1-element Array{Any,1}:
:($(Expr(:lambda, {:ms}, {{},{{:ms,:Any,0}},{}}, :(begin # In[30], line 43:
unless ms < 62214480000000 goto 0
return 0
0:
unless ms >= 63476784000000 goto 1
return 25000
1:
unless ms < 62624707200000 goto 15
unless ms < 62388144000000 goto 8
unless ms < 62293449600000 goto 4
unless ms < 62230377600000 goto 2
return 1000
2:
unless ms < 62261913600000 goto 3
return 2000
3:
return 3000
goto 7
4:
unless ms < 62324985600000 goto 5
return 4000
5:
unless ms < 62356608000000 goto 6
return 5000
6:
return 6000
7:
goto 14
8:
unless ms < 62498476800000 goto 11
unless ms < 62419680000000 goto 9
return 7000
9:
unless ms < 62451216000000 goto 10
return 8000
10:
return 9000
goto 14
11:
unless ms < 62530012800000 goto 12
return 10000
12:
unless ms < 62561548800000 goto 13
return 11000
13:
return 12000
14:
goto 28
15:
unless ms < 62908704000000 goto 22
unless ms < 62798371200000 goto 18
unless ms < 62703676800000 goto 16
return 13000
16:
unless ms < 62766835200000 goto 17
return 14000
17:
return 15000
goto 21
18:
unless ms < 62845632000000 goto 19
return 16000
19:
unless ms < 62877168000000 goto 20
return 17000
20:
return 18000
21:
goto 28
22:
unless ms < 63050832000000 goto 25
unless ms < 62956137600000 goto 23
return 19000
23:
unless ms < 63003398400000 goto 24
return 20000
24:
return 21000
goto 28
25:
unless ms < 63271756800000 goto 26
return 22000
26:
unless ms < 63366451200000 goto 27
return 23000
27:
return 24000
28:
end))))