There are two hard problems in software development: packaging, and deployment.
If you write code, sooner or later you’ll probably need to:
It turns out this is kind of a hard problem, and new solutions are being invented all the time:
Misleading Google Search Results | |
---|---|
deployment methods | 108,000,000 |
package managers | 9,080,000 |
Should I use virtual machines? Containers? Do I need configuration management tools? Should I be looking into hosted services in The Cloud? There are no easy answers to these questions, and it takes a lot of time and practice to become familiar with their tradeoffs and choose an appropriate strategy for your project.
But regardless of what you choose, there’s one thing they almost always have in common: packages.
Somewhere along the line your deployment strategy will be improved by being able to install something (build or runtime dependencies, your own code, or even automation tools themselves) if it’s already been packaged up in a way that’s reproducible and reliable.
We’ll show you how you can incorporate the Nix package manager into your workflow in two days with minimal disruption and time. You can start taking advantage of its benefits almost immediately (such as installing multiple packages at once that have conflicting versions or dependencies), but still use it alongside any package managers you’re already using.
If you haven’t spent time with Nix, you’re probably wondering why you need a new package manager. We already have apt/yum/homebrew/etc., all with their own approaches, and the whole situation starts to feel a little like…
So why bother learning Nix?
Because it is different! It’s based on functional programming concepts and a model that affords it several advantages. Thankfully, the basic features and concepts have already been well covered elsewhere.
I’m hoping that if you’ve made it this far, you already have some interest in it. If you’re skeptical about getting started right away, I recommend spending some time reading the above links to get a better sense of what Nix offers.
The “two days” recommendation is a suggestion for pacing so you don’t need to dive too deep down the Nix rabbit hole on the first day you try it out, but the actual steps we’ll go over could be run through much faster if desired.
Good news! It’s not going to take a day to install Nix (the quickstart install takes a few minutes at most), but we’ll go at a slower pace here so you can spend time learning the basic tools and some of the concepts too.
One of Nix’s defining features is the packages it builds will not depend on global install directories (/bin
, /usr
, /lib
, etc), and the packages will be placed in the /nix/store
. This makes it easy to use alongside existing package managers, because it will not influence or depend on your globally installed packages.1
For Linux or Mac OS X users, the official installation instructions are available on http://nixos.org/nix/. Here’s the short version as of January 2015:
$ curl https://nixos.org/nix/install | sh
$ source ~/.nix-profile/etc/profile.d/nix.sh
The first step will set up the /nix/store
and install utilities like nix-env
that you will use to manage Nix. If you’re concerned about relying on curl
for the install you can read the installation chapter of the manual for further options.
The second step will source a shell script that will export your $NIX_PATH
and modify your user’s $PATH
so it can find utilities installed by Nix.
To search for packages, you can use nix-env -q
and grep to filter the results. Here’s a quick example that will run a query (flag q
) for packages available (flag a
) on your platform, including the package’s attribute path (flag P
). We’ll grep for the cowsay
package, because who wouldn’t want cowsay
:
nix-env -qaP | grep -i cowsay
nixpkgs.cowsay cowsay-3.03
Now you can either install by name (from the right hand column in our search results):
nix-env -i cowsay-3.03
Or by attribute path (as shown in the left hand column of our earlier search results). Note that we have to add the flag A
to indicate we’re installing by attribute:
nix-env -iA nixpkgs.cowsay
Congratulations! You’ve installed your first package with Nix. If you aren’t sure what to do next, try out nix-env -i nix-repl
. This will install the nix-repl
utility that will let you write Nix expressions and interact with Nix in a shell. Examples and getting started instructions for nix-repl
are available here.
You’re now free to install packages without breaking system packages, without obscure failures due to changed or missing global dependencies2, and without dependency hell.3
There are a lot of Nix features that have improved my development workflow, and it’s very hard to pick just one to cover here. But time and time again, one of the most useful for me has been myEnvFun
, which also shows how we can go beyond typical definitions of “package” to solve common development problems.
Note: the “Fun” in
myEnvFun
is for functional. The Nix and NixOS communities make no claims or guarantees of enjoyment derived from using it.
One of the (many) complications in software development is identifying and isolating all of the tools you need to work on a particular project. This isn’t always the case: I usually want tmux
and my favorite editor available regardless of what project I’m working on. But other times you might have projects that require conflicting versions of software, like two or more haskell projects using two or more versions of the compiler ghc
.
What we’d like to do is define and codify these different environments as package sets containing all the tools we need, preferably giving us some quick and easy way to switch between them.
We can do this through a special file ~/.nixpkgs/config.nix
, which may contain package overrides you’ve specified for your user. Here’s how you can create the file for the first time if you don’t already have one:
mkdir -p ~/.nixpkgs
touch ~/.nixpkgs/config.nix
Next we’ll use the built-in packageOverrides
to define one or more new myEnvFun
environments. The below example is written in the Nix language. We won’t explain all of the syntax here, but we’re defining two new packages that can be installed; one that will allow us to use ghc
at version 7.6, and one at 8.3.
# ~/.nixpkgs/config.nix
{
# ~/.nixpkgs/config.nix lets us override the Nix package set
# using packageOverrides. In this case we extend it by adding
# new packages using myEnvFun.
packageOverrides = pkgs : with pkgs; {
ghc76 = pkgs.myEnvFun {
name = "ghc76";
buildInputs = [ ghc.ghc763 ];
};
ghc78 = pkgs.myEnvFun {
name = "ghc78";
buildInputs = [ ghc.ghc783 ];
};
};
}
Here’s how we can install the environments from our snippet above:
# nix-env will look for ~/.nixpkgs/config.nix and, if it exists, use the package
# overrides you've defined there
nix-env -i env-ghc76
nix-env -i env-ghc78
Once they’re installed, there’s no need to reinstall them unless you uninstall them or make a change (for instance, maybe adding a new package to buildInputs
).
Let’s load up our new ghc76
environment first:
$ load-env-ghc76
env-ghc76 loaded
ghc76:[vagrant@nixos:~]$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude>
Leaving GHCi.
ghc76:[vagrant@nixos:~]$ exit
When you’re done you can exit back to your normal shell, which won’t have ghci
installed (unless you specifically installed it for your user). Want to try out the environment with ghc 7.8.3
instead?
$ load-env-ghc78
env-ghc78 loaded
ghc78:[vagrant@nixos:~]$ ghci
GHCi, version 7.8.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude>
Leaving GHCi.
ghc78:[vagrant@nixos:~]$ exit
In practice, taking a few minutes to define these package sets has proven to be a fairly straightforward and reliable to keep project dependencies isolated. Perhaps best of all, you can keep your config.nix
in a repo somewhere and use it on any machine where you need to reproduce those environments.
Want to start writing your own? Here are some tips:
nix-env -i env-dev
buildInputs = [ git tmux ]
;load-env-dev
to load the environmentWant to see it in action? Here’s a fancy animated gif demonstrating the environment switching.
The Nix/NixOS community is growing, and they’ve been developing solutions and novel approaches to a great many packaging and deployment problems. There’s Nix the language (in order to write your own packages you should learn how to write Nix expressions), NixOS the Linux distribution (which lets you write NixOS modules, providing a config management-like layer), and a whole lot more.
Here are some resources for getting started:
1.Note that this only applies to where software is installed. If you install cowsay
via apt-get
and nix-env
then your user’s $PATH
will determine which one is used.
2.Unless you’re on OS X, where builds still require some globals that may change or cause breakage when upgrading to newer versions of OS X. There’s a ##nix-darwin channel on freenode working to address this if you want to contribute.
3.If you’re using the Nix unstable channels there are other kinds of build failures you may encounter, like unintentional backwards incompatibilities in upgraded packages (ex. foo only works because of a bug in bar, bar is upgraded with bug fix, foo stops building until the package maintainers can address it)
]]>But guess what?
They’re doing it. Wrong.
A crack team of compiler inventors has been in stealth mode for literally months preparing a new way to TDD without having to make up units all the time.
The Idris compiler (industry buzzword for a linting tool) is so advanced you can:
Have you ever written code to access an array index and you just knew it wouldn’t fail, but you still had to account for that possibility? In the early days of computing (circa 2009 when node.js was created) everyone tried to make up for this uncertainty by writing unit tests.
But the Idris compiler doesn’t mess around: you can just say it won’t fail, so it won’t.
This is made possible through the magic of dependent types, a type of type even more dynamic than dynamic types, because they let types depend on values. Types are no longer mindless declarations like Int or String or Whatever in dependent typing. You can have particular values, non-empty containers, and more complex relationships all inside the type signature.
Behold, examples:
-- tell the compiler that concatenating vectors of size n and m makes a new
-- vector of size n + m. The compiler checks the cases for you!
concatVectors : Vect n a -> Vect m a -> Vect (n + m) a
-- repeat values of some type "a" `n` many times in a vector. The size of the
-- returned vector is guaranteed to be the size of `n`
replicate : (n: Nat) -> a -> Vect n a
-- no out of bounds access here! you can only look up an index in a vector of
-- size `n` if you call it with a number between 0 and `n`
index : Fin n -> Vect n a -> a
Best of all: the dynamic checks only happen once before your program ever runs!
This works through a process known as mathematical proofing, where the compiler knows enough about your code to ensure coverage instead of just guessing at it with a handful of tests. If you try to express something the compiler doesn’t know how to check already, you can switch to an interactive theorem proving mode, letting you dynamically solve problems before the code ever runs.
Let’s recap. Idris and dependent types make it possible to:
Imagine you’re building a next gen full stack web app where users can earn and redeem special tokens whenever they recommend your app to a friend. You want to make a stack-like structure that lets you track the number of recommendations and the number of redemptions, but always ensure the number of redemptions is less than or equal to the number of recommendations (in Idris, the type for a relation n <= m
is LTE n m
).
First you define some data:
data User = MkUser String
data Redeem = MkRedeem Int
data Earn = Recommend User
Action : Type
Action = Either Redeem Earn
data History : Type where
MkHist : (user : User) ->
(redeemed : Nat) ->
(offset : Nat) ->
(earned : Nat) ->
LTE (redeemed + offset) earned ->
Vect (redeemed + earned) Action ->
History
We know that our users can always recommend the app to friends, so let’s write a function to update their history when they make a recommendation:
recommendApp : History -> User -> History
recommendApp (MkHist u r o e p v) friend =
MkHist u r (S o) (S e) p v'
where a : Action
a = Right $ Recommend friend
v' : Vect (r + (S e)) Action
v' = rewrite (sym $ plusSuccRightSucc r e) in a :: v
Anytime you see something of type Nat
(a natural number, or a whole number greater than or equal to 0), you can take the successor of that number with S
. For any natural number n
, S n
is equivalent to n + 1
.
But when you run this, Idris finds an issue!
Can't unify
LTE (plus r o) e
with
LTE (plus r (S o)) (S e)
Specifically:
Can't unify
e
with
S e
It’s telling us that having proved (r + o) <= e
isn’t the same as proving (r + o + 1) <= e + 1
. Notice how Idris came up with this test all on its own even though we didn’t write any units!
But if we have a valid LTE
relationsip then it’s pretty clear you can add one to each side and show the relationship holds.
This is where dynamic testing comes in. Except instead of writing a bunch of unit tests in a separate file somewhere you can just put a variable with a question mark right in your code to show we have no idea what we’re doing. Idris calls variables prefixed with a question mark “metavariables”, and by convention we use ?wtf, ?notagain, or ?sendhelp.
recommendApp : History -> User -> History
recommendApp (MkHist u r o e p v) friend =
MkHist u r (S o) (S e) ?wtf v'
where a : Action
a = Right $ Recommend friend
v' : Vect (r + (S e)) Action
v' = rewrite (sym $ plusSuccRightSucc r e) in a :: v
Now if you load this up in the Idris interpreter and enter the command :p wtf
it will tell you what it is you’re trying to do, even if you were just making things up. We’ll also type intros
to have it take all of the arguments as givens and show us what we’re solving for (the goal):
-main.wtf> intros
---------- Other goals: ----------
{hole6},{hole5},{hole4},{hole3},{hole2},{hole1},{hole0}
---------- Assumptions: ----------
u : User
r : Nat
o : Nat
e : Nat
p : LTE (plus r o) e
v : Vect (plus r e) (Either Redeem Earn)
friend : User
---------- Goal: ----------
{hole7} : LTE (plus r (S o)) (S e)
If we know that p = LTE (plus r o) e
is a given, one way to solve the goal is to show that LTE (plus r (S o)) (S e)
can be rewritten as p
. But it’s kind of hard to do that without first breaking down plus r (S o)
, so let’s rewrite it in the form S (r + o)
instead. There’s a built-in proof called plusSuccRightSucc
that lets us do just that, so we’ll use the rewrite
tactic:
-main.wtf> rewrite (plusSuccRightSucc r o)
---------- Other goals: ----------
{hole7},{hole6},{hole5},{hole4},{hole3},{hole2},{hole1},{hole0}
---------- Assumptions: ----------
u : User
r : Nat
o : Nat
e : Nat
p : LTE (plus r o) e
v : Vect (plus r e) (Either Redeem Earn)
friend : User
---------- Goal: ----------
{hole8} : LTE (S (plus r o)) (S e)
Notice how the goal has been updated for us based on the rewrite.
Now if only we had a way to prove that LTE n m
implies LTE (S n) (S m)
we could solve for this. Good news! The very definition of LTE
contains a constructor lteSucc
that proves just this. We’ll use the mrefine
tactic to rewrite the relationship for us (unlike rewrite
, mrefine
will use pattern matching so we don’t need to supply the variables explicitly):
-main.wtf> mrefine lteSucc
---------- Assumptions: ----------
u : User
r : Nat
o : Nat
e : Nat
p : LTE (plus r o) e
v : Vect (plus r e) (Either Redeem Earn)
friend : User
---------- Goal: ----------
{__pi_arg516} : LTE (plus r o) e
If the goal you’re solving for is in the same form as one of the assumptions, you can use the trivial
tactic to complete the proof, and qed
to see the results:
-main.wtf> trivial
wtf: No more goals.
-main.wtf> qed
Proof completed!
main.wtf = proof
intros
rewrite (plusSuccRightSucc r o)
mrefine lteSucc
trivial
We need this proof in our source file, but having to copy and paste is the kind of thing we did in the early 2010’s, and that doesn’t cut it anymore. After entering qed
for a solved proof, you can use :addproof
to have it automatically appended to your source file.
So far so good! But we said users can only redeem tokens if they have made enough recommendations to other users, and that’s something we can only know at runtime. Idris might be magic, but even Idris can’t predict the future.
We might first think to write a function with type History -> Redeem -> History
, but it’s impossible to redeem a token if a user doesn’t have enough recommendations, and those values are only known at runtime. So let’s try History -> Redeem -> Maybe History
instead, and it’ll look a little something like:
redeemToken : History -> Redeem -> Maybe History
redeemToken (MkHist _ _ Z _ _ _) _ = Nothing
redeemToken (MkHist u r (S o) e p v) token =
Just $ MkHist u (S r) o e ?redeemPrf (Left token :: v)
Remember that weird looking offset value we carry around in the History
type? It’s time to put it to use! If we didn’t have an offset we’d only ever know that we had a number of redeemed tokens less than or equal to the number of earned tokens, and r <= e
isn’t enough information to prove r + 1 <= e
.
The offset lets us rewrite everything as r + o <= e
. If o
is 0 (the natural number Z
) then the problem reduces to r <= e
and can’t be solved. But if o
is greater than 0 then we can always rewrite r + (o + 1)
as (r + 1) + o
. This lets us increase the count for redeemed tokens and the size of our history vector, while always enforcing we’ll never have more redeemed tokens than earned tokens.
Writing the ?redeemPrf
is a fun exercise, or you can see a full, working example of the code in this gist.
Writing correct software and only having to check things once means efficiency, and efficiency means success. And money. Mostly money.
The cost of efficiency is having to know what you want to write before you write it, and our findings have shown this is a useful property in software development despite conventional wisdom.
In the preceding example we showed that you can make a data structure correct by construction: if you can’t show you’re doing it right, you can’t construct an instance of it. Imagine all the hours we just saved from writing TDD driven tests! Idris just let us interactively write one big test that only has to run once before your program runs, and all the test cases are covered there on out.
This might be a silly example, but imagine a world where crypto libraries can’t fail due to bounds checks.
Imagine it.
Get started with this amazing new technology by installing the Haskell Platform (if you don’t have GHC and cabal-install on your system already), and running the commands:
cabal update
cabal install idris
You can read more detailed instructions for different operating systems on the Idris wiki.
Now you can work through the tutorial on the docs page to learn more!
]]>Things didn’t go quite as smoothly as expected.
But before we get to that…
Hoogle is easy to cabal install
and get started with locally. For many use cases, all you need is to install it and populate it with data using either:
# creates databases for many common libs
hoogle data
Or:
# creates databases for a whole lot of libs
hoogle data all
If you’re using sandboxes, you may have to specify the location of Hoogle with .cabal-sandbox/bin/hoogle data
(even though I had that instance of Hoogle higher up on my search path, I ran into a quirk in my environment where it couldn’t find the cabal dirs it needed if I didn’t specify the relative path).
From there you can start searching at the command-line:
$ hoogle "(a -> b) -> [a] -> [b]"
Prelude map :: (a -> b) -> [a] -> [b]
Or you can run hoogle server -p 1234
to serve the web version on localhost at port 1234 (or port of your choice).
GHCi is more than just a Haskell interpreter: you can also use it to issue shell commands. If you haven’t done this before, try it out! You can prefix shell commands with :!
. For instance, :!pwd
will print the current working directory in GHCi.
This means you can also call Hoogle from within GHCi, assuming it’s on your path:
Prelude> :! hoogle "[a] -> Int"
Prelude length :: [a] -> Int
This works, but it’s a little clunky to have to quote the search term. There’s a Hoogle entry on the HaskellWiki with a tip for getting around this. You can add this to your .ghci
file (in your cabal sandbox folder, project folder, or as described here):
# .ghci
:def hoogle \x -> return $ ":!hoogle \"" ++ x ++ "\""
:def doc \x -> return $ ":!hoogle --info \"" ++ x ++ "\""
Now you can call them handily within GHCi:
*Main> :hoogle head
Prelude head :: [a] -> a
Data.List head :: [a] -> a
...
*Main> :doc head
Prelude head :: [a] -> a
Extract the first element of a list, which must be non-empty.
From package base
head :: [a] -> a
Especially when you’re first learning a library, it can also be helpful to limit search results to that library. For instance, if you want to find all of the Hakyll functions that make use of Compiler
, use +hakyll
to search only that module:
Prelude> :hoogle +hakyll Compiler
Hakyll.Core.Compiler data Compiler a
Hakyll.Core.Compiler module Hakyll.Core.Compiler
...
The next step in my journey for making the most of Hoogle was finding a way to search my current project while working on it. The high level process for creating a Hoogle database is to use haddock
(commonly via cabal haddock
) to generate a text file suitable for consumption via Hoogle, convert the text file to a .hoo
Hoogle database, and combine it with an existing Hoogle database.
Let’s break it down. First, cabal has a haddock command that is very convenient to use when working with sandboxes. The --hoogle
flag will generate a text file database, and --all
says to generate one for everything in the package in the current working directory (you could also specify any of --executables
, --tests
, of --benchmarks
). If you’re writing a library, you don’t need to specify --all
, but it’s useful when you want to be able to search everything in your current project:
cabal haddock --hoogle --all
Now we can use Hoogle’s convert
command to create a .hoo
file from the text database. The text file should be somewhere in the current working directory under dist/doc/html:
hoogle convert dist/doc/html/path/to/your/package/docs.txt
Lastly you can combine it with the default.hoo
database (typically somewhere in your global, user, or sandbox cabal/share
folder):
hoogle combine default.hoo dist/doc/html/path/to/your/package/docs.hoo
My original goal was making it easy to generate a database with all the packages in a cabal sandbox. It turned out to be challenging for a few reasons, one of which is that cabal installing packages (sandbox or no) doesn’t create the .txt
or .hoo
files needed by Hoogle. Some quick research shows that adding this type of functionality isn’t a new issue.
This is by no means a trivial addition to cabal
, but it’s arguably the cleanest solution to the problem.
However, if you want a quick hack in the meantime, the basic idea is making use of:
ghc-pkg
to get a list of sandboxed packagescabal get
to fetch each package’s source codecabal haddock
to generate .txt
databases for eachhoogle convert
to create the .hoo
databaseshoogle combine
to merge the databases into a single default.hoo
This process isn’t perfect, but it’d look something like this:
# get an easy to parse list of packages pinned at their installed versions
ghc-pkg list --package-db=".cabal-sandbox/<architecture>-ghc-<version>-packages.conf.d/" --simple-output
# then for each package:
cabal get <package> -d <destination directory>
cabal haddock --hoogle --haddock-options='<package>/Setup.hs'
cabal convert <package>/dist/doc/html/<package>.txt
cabal combine path/to/default.hoo <package>/dist/doc/html/<package>.hoo
This approach is problematic because not all installed packages are libraries (in which case cabal haddock
will generate errors and not exit cleanly), the location and name of the setup file may vary, and having to cabal get
a lot of packages can be an expensive and time consuming operation.
Overall I’ve found it much easier to start with hoogle data all
and then add my own package, rather than try to automate database creation for sandboxed libraries. You get greater search capabilities (sometimes with too many results, but it’s easy to limit the search by module), and it doesn’t stop you from building your own databases as needed.
Some links I found indispensable in learning the various ways one can Hoogle:
]]>Oh, that’s right. We don’t have to imagine.
But now imagine that, among all the noise, all the tools and frameworks created for specialized use cases but recommended for all, lies something useful. Something with the potential to change how you describe relationships in code so you can make it correct by construction, not just assumed correct by testing.
A powerful way to achieve this is by using a rich type system, like those in Haskell, Scala, and OCaml.
It’s important to understand that these languages will have issues. No programming language is the best language. None will be the language to end all languages. The more you spend time using one of them for more and more complicated use cases, the more likely you are to run into different types of limitations.
But that’s OK, because this post isn’t about those languages. It’s about leveraging type systems to write better code. For the few people who like to point out that these languages are experimental, or untested, or too academic, however, I’d like you to keep this in mind:
There’s a group specializing in all kinds of leadership and skill building techniques that uses the term Fool’s Choice to describe dilemmas where you see a binary choice (either/or) instead of a multitude of options.
When it comes to people not wanting static types, this is the line of reasoning I see:
The options are sometimes seen as limited, cumbersome types or no types at all.
So if you do understand static types through the lens of Java, or C/C++, or similar languages, then I have a favor to ask. Imagine that everything you know about static types is wrong. Imagine that what you’ve learned about them has nothing to do with actual static types, but only the specific, broken implementations of them that most of us are exposed to.
Do that, and I can tell you what static types are really about.
It’s tempting to think of types as a way to declare the contents of a variable. If I say the variable foo is an integer, you know the variable foo is an integer. That might be helpful in some sense, but in dynamic languages you don’t need someone to tell you that “foo = 5” means foo is an integer. You’re not going to write tests asserting that foo is an integer, and indeed, you probably aren’t even going to think of it in those terms. You don’t need to, after all.
But that’s not very interesting. If you only see types as something that help you declare the obvious and prevent simple bugs that you can check by eye or make assertions about, of course you’ll see no need for them. And in that case, the productivity you get from writing in a language like python will absolutely trump that of Java.
Set yourself free from making meaningless declarations! Reduce the size of your code! Simplify refactoring!
So if we don’t spend our dynamic language time testing types, what do we test? Let’s say you write a library function in python that takes any iterator and writes the contents to file. In the true spirit of python, you don’t care what “type” of object someone passes in; anything that allows iteration is fine, and of course you’d never want to limit yourself to only iterating over integers, or strings, or whatever.
Now ask yourself: how do you make sure someone using your library calls your function with an iterable?
In the above example, it’s absolutely essential to your library’s functionality that someone only ever passes in an iterable, and you have no way of making sure they do that. If they pass in something that doesn’t allow iteration, everything explodes. You can decide to hope for the best (and when hope fails they’ll see a built-in exception that might be confusing), or explicitly check that they pass in an iterator and raise a more meaningful exception.
But if you want to make sure your program behaves as expected, you’ll need to test it against both iterators and non-iterators to ensure the behavior is correct.
What would be ideal here is a way to describe the behavior of your program at the type level. Not to declare an exacting, exhaustive list of types that your function accepts, but a whole class of types that can be used as iterators.
In Haskell, it’d look a little like this:
writeLines :: Iterator a => a -> WriteFileAction
This reads as “we have a function named writeLines that takes an iterator of any arbitrary type a, and produces an action that writes to a file.”
This example is important: when you hear functional programming enthusiasts saying type systems reduce testing, this is the kind of thing we mean. You’ve just described a behavior that prevents anyone from trying to write to file with your library unless they pass in an actual iterator. It’s correct by construction: try to call it with a non-iterator and it won’t compile. You don’t need extra logic or tests to account for that possibility.
There’s a big scary word we functional folk like to pass around called parametricity. It has a very specific meaning and is covered in many research papers, but for our introductory purposes here we can say it’s something that helps you reason about what a function can or can’t do by understanding how its properties hold true for more than one type.
Let’s look at our example one more time:
writeLines :: Iterator a => a -> WriteFileAction
The syntax might be unfamiliar, but the definition tells us that we can ensure the function is only called with an iterator. We don’t care what that iterator is.
But this tells us something else, too: if we don’t know what the iterator is, how it iterates, or what it contains, this function can’t do anything except iterate.
When your goal is reasoning about code and understanding it, it’s not possible to understate how huge this subtle implication really is. If we write a function that holds true for all iterators, we can’t do any non-iterator things to it!
For instance, if someone calls it with a string iterator, we can’t manipulate the strings. How could we? If we wrote something specific to strings, the type would be Iterator String -> WriteFileAction.
This gives us an unprecedented level of safety by ensuring the function will only be able to make use of the iterator interface. When you are refactoring a large program this makes it very easy to pull out sections of code and replace them, because you know exactly what the code could or couldn’t do.
Compare that to code that can do anything at anytime like raise exceptions or manipulate shared state. I’ve even seen dynamic code that would iterate over collections, check if the contents were a particular “type”, like a string, and tag something on to them. When those special cases can occur anywhere in an untyped language, you always need to be on guard for them.
When we write code, especially code any other human (including your future self) will need at some undetermined future point in time, we want some way to tell that human how the code works.
If you have a rich type system, you’re halfway there already. Docs get out of date, test specifications are ill-specified, but types are forever. You get the types right or your program doesn’t compile. Want to write less tests? Write more types. You only need tests to the extent that you don’t have types.
And the hidden benefit to adopting this mentality (thinking in terms of correct by construction) is it forces you to consider those same cases in untyped code. It makes you all too aware of how quickly a piece of untyped code can fail if someone passes in objects that don’t fulfill some expected interface or behavior.
If you start thinking in terms of the behaviors you want to describe and enforce, it quickly gets you in the right state of mind for asking what happens when that behavior can’t be enforced. You can use that wariness to convey to your users how it should work, you can state your expectations in the API and narrative docs, and you can convey what will happen when that expectation isn’t met (an exception that gets raised, a function that returns some none or null type, etc).
So my challenge to you is this: if you have not spent an extensive amount of time working with a rich, expressive type system, spend some time learning you a Haskell. Learn it for fun, learn it for something new, learn it to expand your mind, learn it for The Real World. Whatever works best for you. But make an effort to understand just how expressive you can be in writing code and how you can cut down on tests with nicer types.
]]>But QuickCheck does more than help us write tests: it offers an efficient, rich API for randomly generating data. We’re going to show how you can generate a CSV file with potentially millions of fake user records. The main use case is populating a database with loads of data for interactive testing, but this method is also useful for testing outside programs and bulk data jobs.
This post is written in literate Haskell, so let’s get our obligatory top-level imports out of the way before we get too far along:
> {-# LANGUAGE OverloadedStrings #-}
>
> import Data.Time
> import Data.Char (chr)
> import Test.QuickCheck
> import Control.Applicative
> import Data.Vector (Vector, (!))
> import qualified Data.Vector as V
> import Data.Text (Text)
> import qualified Data.Text as T
> import qualified Data.Text.IO as TIO
> import System.Environment (getArgs)
> import Text.Read (readMaybe)
We’ll start with a basic user profile definition, similar to what you’ll find on many social media sites:
> data UserProfile = UserProfile {
> firstName :: Text
> , lastName :: Text
> , email :: Email
> , password :: Text
> , gender :: Gender
> , birthday :: Birthday
> } deriving Show
>
> -- helper for rendering a UserProfile as text
> -- (passwords will be quoted, and generated without "" marks or control chars)
> profileText :: UserProfile -> Text
> profileText profile = T.intercalate "," [
> firstName profile
> , lastName profile
> , emailToText $ email profile
> , T.concat ["\"", password profile, "\""]
> , T.pack . show $ gender profile
> , T.pack . show $ birthday profile
> ]
Note: the use of a binary gender definition here is to emulate the type of profile I’ve tested against, but it’s also exclusionary and a poor UI decision (read this for some alternatives and reasons not use it).
Next we’ll create our custom Email, Gender, and Birthday types:
> data Email = Email {
> local :: Text
> , domain :: Text
> } deriving Show
>
> emailToText :: Email -> Text
> emailToText e = T.concat [local e, "@", domain e]
>
> data Gender = Female | Male
> deriving Show
>
> data Birthday = Birthday {
> year :: Integer
> , month :: Int
> , day :: Int
> }
>
> -- display birthdays in the format YYYY-MM-DD
> instance Show Birthday where
> show bday = show $ fromGregorian (year bday) (month bday) (day bday)
QuickCheck has an Arbitrary typeclass that you can use for defining how to randomly generate a piece of data for a given type. Arbitrary instances only require you to supply a definition of arbitrary (Gen a
).
Here we’ll define a Gender instance using elements ([a] -> Gen a
):
> instance Arbitrary Gender where
> arbitrary = elements [Female, Male]
Now we’d like to do the same for birthdays. Using the Data.Time library, we can represent dates as modified Julian days. Here I’ve arbitrarily chosen to generate birthdays between day 25,000 (1927-04-30) and day 55,000 (2009-06-18) inclusive, along with a helper function for converting the integer day to a Birthday.
> instance Arbitrary Birthday where
> arbitrary = birthdayFromInteger <$> choose (25000, 55000)
>
> birthdayFromInteger :: Integer -> Birthday
> birthdayFromInteger i = let (y, m, d) = toGregorian (ModifiedJulianDay i) in
> Birthday { year = y, month = m, day = d }
QuickCheck makes the choice for us using choose (Random a => (a, a) -> Gen a
), and we use fmap (<$>) to apply our helper function of Integer -> Birthday
.
Next we’d like to generate passwords, but there’s a potential issue: we’ve defined names and passwords to all be of type Text. How can we define a single instance of Arbitrary Text to cover all of these cases?
There are several ways to approach this problem, and in a real application you could make a strong argument for creating new data types (or newtypes) for each of these fields. But in our example, the simplest answer is to not define an instance of Arbitrary for the name and field records. The arbitrary function is type Gen a
, and we can write our own functions of this type without Arbitrary:
> -- creates a text password of random length from the characters A-z, 0-9, and:
> -- #$%&'()*+,-./:;<=>?@[\]^_`{|}~
> genPassword :: Gen Text
> genPassword = T.pack <$> listOf1 validChars
> where validChars = chr <$> choose (35, 126)
By design we won’t generate passwords containing quotation marks or other characters that would require escaping. This is done purely to keep this example short and make our job easier when we eventually print results in a minimal CSV format. If you find yourself writing a full-fledged program for generating CSV data, I recommend using cassava.
Any programmer will tell you that naming is hard. So let’s cheat: the US government offers lists of first and last names from 1990 census data.
I’ve cleaned up that data so names are in Title Case, one name per line, in files named: female_first_names, male_first_names, and last_names. There are less than 90,000 names total in all the files so we can easily store them in memory, and we’d like to access any element by index in constant time. This is a job for Data.Vector!
This means we’ll need a function of Vector Text -> Gen Text
to choose a random name from a vector of names, so let’s create some helper functions:
> nameFromVector :: Vector Text -> Gen Text
> nameFromVector v = (v !) <$> choose (0, upperBound)
> where upperBound = V.length v - 1
>
> vectorFromFile :: FilePath -> IO (Vector Text)
> vectorFromFile path = V.fromList . T.lines <$> TIO.readFile path
>
> nameGenFromFile :: FilePath -> IO (Gen Text)
> nameGenFromFile path = nameFromVector <$> vectorFromFile path
And since we’ll need to pass around multiple generators, we can capture them in a new data structure (saving us from passing around three different generators to every function that needs them):
> data NameGenerators = NameGenerators {
> femaleFirstNames :: Gen Text
> , maleFirstNames :: Gen Text
> , lastNames :: Gen Text
> }
And finally, our function for loading all of the NameGenerators:
> allNameGenerators :: IO NameGenerators
> allNameGenerators = NameGenerators <$> nameGenFromFile "female_first_names"
> <*> nameGenFromFile "male_first_names"
> <*> nameGenFromFile "last_names"
Hardcoding filepaths isn’t exactly a Best PracticeTM, but in this case if a file isn’t found, we want the program to fail hard, and the default “<filepath>: openFile: does not exist (No such file or directory)” error message is sufficient.
QuickCheck is very good at generating random data, so the challenge with generating email addresses is not what to generate, but what not to generate. If you’re clicking interactively through a test site and every email looks like “r36oEx04C4d8l9q6q38V3xMu@Vj4WWrRcZdpCsKy904Dhz65Uy0.com” it’s a little discomfiting.
For the domain portion of the email address, we’ll prepare a small list of popular domains and made up weighted values to decide how frequently each should occur (we’ll see how to make use of these values soon):
> emailDomains :: [(Int, Gen Text)]
> emailDomains = map (\ (i, t) -> (i, pure t)) [
> (50, "yahoo.com")
> , (40, "hotmail.com")
> , (30, "aol.com")
> , (20, "gmail.com")
> , (10, "sbcglobal.net")
> , (8, "yahoo.co.uk")
> , (6, "yahoo.ca")
> ]
We could automate building a list like this from a file containing many more domains and actual frequencies if we really wanted to match historical data or real world usage in a particular context.
Next we’d like to create a couple of functions to generate the local part of an email address in different ways. We’ll start with two plausible forms, <first initial><last name> and <last name><digits>:
> -- initialWithLast "Foo" "Bar" would produce a generator returning "fbar"
> initialWithLast :: Text -> Text -> Gen Text
> initialWithLast fName lName = pure $ initial `T.cons` rest
> where initial = T.head . T.toLower $ fName
> rest = T.toLower lName
>
> -- lastWithNumber "Bar" will return barXX (XX for any two digits 11-99)
> lastWithNumber :: Text -> Gen Text
> lastWithNumber lName = T.append namePart <$> numberPart
> where namePart = T.toLower lName
> numberPart = T.pack . show <$> numId
> numId = choose (11, 99) :: Gen Int
We can put it all together using QuickCheck’s oneof ([Gen a] -> Gen a
) to randomly choose from the above functions for the local part, and frequency ([(Int, Gen a)] -> Gen a
) to select domains from our weighted list:
> genEmail :: Text -> Text -> Gen Email
> genEmail f l = Email <$> oneof [initialWithLast f l, lastWithNumber l]
> <*> frequency emailDomains
These examples are only meant to be illustrative, and while the email addresses will look somewhat convincing, there won’t be much variation. You can always extend the list of strategies with as many email patterns as you can think of: first name with last initial, nick names, foods, random dictionary words, incorporating the user’s birth year in any of the other patterns, etc.
We finally have all of the generators we need to create a complete user profile:
> genUserProfile :: NameGenerators -> Gen UserProfile
> genUserProfile nameGens = do
> gender <- arbitrary
> bDay <- arbitrary
> fName <- case gender of
> Female -> femaleFirstNames nameGens
> Male -> maleFirstNames nameGens
> lName <- lastNames nameGens
> email <- genEmail fName lName
> password <- genPassword `suchThat` ((>5) . T.length)
> return $ UserProfile fName lName email password gender bDay
Note that we create a new password generator on the fly using the suchThat modifier (Gen a -> (a -> Bool) -> Gen a
) with our original generator. We could have placed this constraint in the genPassword definition, but this example shows how you can easily create modified generators for particular use cases.
QuickCheck is mostly designed to help you test generated data, not generate data for arbitrary uses (hah, hah). But even though it doesn’t export tools for working with the internals of Gen directly, it does export a function called sample’ that always generates a list of 11 results in the IO monad. We can pair this with concat and the vectorOf generator to create as many elements as we want, as long as you want multiples of 11. In case you don’t, we’ll apply take to ensure we only extract the requested number of elements:
> generate :: Int -> Gen a -> IO [a]
> generate n gen = take n . concat <$> (sample' . vectorOf count) gen
> where count = ceiling $ fromIntegral n / 11.0
If this looks like a hack, well, sure. It is. The sample’ function exists for debugging purposes and isn’t a perfect fit here, but it’s the only exported function we have to work with that will give us Gen a -> IO [a]
.
We can round out the program with some basic command-line arg handling (allowing a user to specify the number of records to generate), and a main method for printing data in our CSV-compatible but not exactly robust format.
> countDefault :: Int
> countDefault = 100
>
> -- tries to read the first command-line arg as an Int (the number of records
> -- to generate), otherwise uses the default.
> handleArgs :: [String] -> Int
> handleArgs [] = countDefault
> handleArgs (x:_) = case readMaybe x :: Maybe Int of
> Just n -> n
> Nothing -> countDefault
>
> main = do
> count <- handleArgs <$> getArgs
> profileGen <- genUserProfile <$> allNameGenerators
> profiles <- generate count profileGen
> TIO.putStrLn "first,last,email,password,gender,birthday"
> mapM_ (TIO.putStrLn . profileText) profiles
Here’s a snippet from the arbitraryfun cabal file if you’d like to use this as an executable:
executable arbitraryfun
hs-source-dirs: src
main-is: Main.lhs
default-language: Haskell2010
build-depends: base >= 4.6
, QuickCheck >= 2.6
, time >= 1.4
, text >= 1.1
, vector >= 0.10
Keep in mind you’ll also need to:
And after all of our work, here’s what we get on a sample run:
$ arbitraryfun 10
first,last,email,password,gender,birthday
Kathey,Hodgeman,hodgeman94@hotmail.com,"%.=kn3",Female,1947-11-15
Lorri,Weyland,weyland73@yahoo.com,"v/.;}?",Female,1990-02-06
Celena,Kali,ckali@yahoo.com,"pg(VjsR",Female,1981-10-14
Blaine,Mellema,mellema21@sbcglobal.net,"l{Um:-b6k",Male,1990-07-02
Bud,Potempa,potempa27@gmail.com,"JB:*]*>",Male,1993-01-28
Aletha,Schoenecker,aschoenecker@yahoo.com,"#A%6lUf",Female,1998-10-13
Connie,Romesburg,cromesburg@yahoo.com,"$Y$>iEl>e",Male,1950-01-27
Ione,Primus,primus66@hotmail.com,"B[9^K+qnj<f9'",Female,1993-05-10
Sylvia,Magorina,smagorina@yahoo.com,"^+#p1l+",Female,2007-01-13
Fermin,Lampey,flampey@sbcglobal.net,"pq@f<v8m*",Male,1929-07-11
This is by no means a robust program, but we’ve put enough constraints on the generated data that you should be able to view it in a spreadsheet or use it with many CSV import tools. In a completely non-rigorous benchmark this program was able to generate about 40,000 records in a second, and thanks to lazy Haskell magic, QuickCheck, and Data.Text, it also showed a low, constant memory usage even when generating 10 million records and piping them to a file (a process that took less than 4 minutes).
]]>The motivation for isolating environments is straightforward: if you’re working on more than one project at a time, the projects may have conflicting dependencies, and managing all of them at the system level is a nightmare.
Throw in large libraries and web frameworks at different versions and you have a recipe for dependency hell. Trying to resolve and accommodate every new build error at the system level is enough to make anyone superstitious, performing archaic rituals and beseeching the mighty build gods before daring to run their next cabal command.
Even if you manage to make it work, it leaves your system environment in a state that might not be easy to reproduce, making it harder to troubleshoot build issues others might experience with your software. After having been through this enough times on my own (and across enough languages), I finally realized that sandboxes shouldn’t be the exception during development: they should be the default.2
Before we work through a quick example, here are the versions I’m using:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
$ cabal --version
cabal-install version 1.18.0.1
using version 1.18.0 of the Cabal library
My blog is created with Hakyll, a Haskell library with many dependencies. If you wanted to learn Hakyll and use my code as a starting point, it’s entirely possible (and likely) that something in the chain of dependencies will conflict with Haskell libraries you have installed at the system level.
Here’s how you can build chromaticleaves
in a sandbox to avoid these issues:
$ git clone git@github.com:ericrasmussen/chromaticleaves.git
$ cd chromaticleaves
$ cabal sandbox init
Writing a default package environment file to
/path/to/chromaticleaves/cabal.sandbox.config
Creating a new sandbox at /path/to/chromaticleaves/.cabal-sandbox
Now that we’re in a sandbox, the next step is installing all the dependencies from chromaticleaves.cabal (it’s a deceptively short list, but Hakyll will pull in many other dependencies):
$ cabal install --only-dependencies
Hopefully everything will install fine, but you may still see some missing system dependencies or other issues depending on your OS. The output from the install command should provide details.
Once you’ve got that sorted out, you can install the site
binary with:
$ cabal install
This will create the executable .cabal-sandbox/bin/site that you can use to launch the site locally with “site preview”, rebuild after changes with “site rebuild”, and anything else from Hakyll’s The Basics tutorial.
Lastly, you can even jump into a fully loaded GHCi session using:
$ cabal repl
Which will start GHCi with all of the top level functions from the chromaticleaves
main source file.
When you cabal install anything in your sandbox (including any executables from the software you’re developing), they’re placed in your/sandbox/.cabal-sandbox/bin. It’s convenient to add this relative path to your system’s $PATH variable:
.cabal-sandbox/bin
Preferably adding it before your user cabal bin and other bin folders. Specifying it as a relative path means that when your current working directory contains a sandbox, any binaries installed there take precedence.
However, note that this only works for executables installed with “cabal install” in your sandbox. There’s also a “cabal build” command that creates dist files in meaningfully named subfolders. The command will work just fine, but note that the simple relative path we used above won’t pick up any binaries installed that way.
If you followed along on the above blog building example, then going to the chromaticleaves directory should automatically place the sandboxed site
on your path. You can verify with:
$ which site
.cabal-sandbox/bin/site
Initializing a cabal sandbox will add a hidden folder and a config file to your current working directory. If you manage your project with version control, you should add these to your ignore/boring files:
.cabal-sandbox/
cabal.sandbox.config
There are many common commands and other usage patterns not covered here. The best thorough introduction to using cabal sandboxes is An Introduction to Cabal sandboxes.
It’s a must-read if you plan on using them, and you should also keep the official Cabal User Guide handy as a reference.
1. hsenv only works on *nix systems but has the advantage of fully sandboxing ghc, ghci, and cabal, instead of relying on their system versions and only sandboxing build dependencies.
2. Of course, there are other solutions to this problem: jails, containers, VMs, buying a new laptop for each project, etc.
]]>Heist now comes in two flavors: interpreted and compiled. The former has been around longer, is more flexible, and has a very accessible API. It is plenty fast for many use cases, but inefficient because it requires traversing templates node by node each time they’re rendered.
Compiled Heist takes a different approach: it compiles as much of the templates down to ByteStrings as possible, letting you fill in runtime values only where you need them. The result is a staggering performance gain, with some compiled templates rendering at more than 3000x the speed of their interpreted equivalents.1
The price you pay for these huge gains in performance is having to specify and load all of your compiled splices, once, at the top level of your application.
Take a moment to let that sink in: all of your top level splices need to be pre-defined and available at the time your application loads. Unlike interpreted Heist, you can’t bind local splices to a template at render time. When you render a compiled template in a Snap Handler, the only splices it can use are those you defined in your HeistConfig.
If you’re only familiar with interpreted splices, you might be wondering how this inversion of control affects us. Specifically, two questions come to mind:
The first problem can be solved with the notion of a RuntimeSplice, which you can think of as a computation that will be evaluated at runtime each time its needed, letting you perform the IO and logic you need for accessing databases, reading from files, etc.
We can reuse nodes by declaring any compiled splices we need within the top level splice. You can think of it as nesting splices, or inner splices, or binders full of splices, or… nevermind. Let’s just work through an example.
Here’s a sample template where the <allTutorials>
node contains nodes representing one table row for a single tutorial. We’d like to be able to repeat those nodes once for each tutorial in a list of tutorials:
<table>
<thead>
<tr>
<th>Title</th>
<th>Author</th>
</tr>
</thead>
<tbody>
<allTutorials>
<tr>
<td>
<a href="${tutorialURL}"><tutorialTitle/></a>
</td>
<td>
<tutorialAuthor/>
</td>
</tr>
</allTutorials>
</tbody>
</table>
We’ll get started by defining a simple tutorial type:
data Tutorial = Tutorial {
title :: Text
, url :: Text
, author :: Text
}
Now, remember we mentioned being able to defer computations until runtime? To keep things simple we’re going to return a constant list of Tutorials as the result of a RuntimeSplice computation, but in a real world app you could query a database or obtain the list from another source:
tutorialsRuntime :: Monad n => RuntimeSplice n [Tutorial]
tutorialsRuntime = return [ Tutorial "title1" "url1" "author1"
, Tutorial "title2" "url2" "author2"
]
Here’s where things get interesting: there is virtually no API for working directly with RuntimeSplices, so we can’t easily inspect the underlying runtime value and bind the result to a node name. Instead, we’re going to create Splices containing a function that can do this for us. Note that in the examples below, Heist.Compiled is imported as C
.
splicesFromTutorial :: Monad n => Splices (RuntimeSplice n Tutorial -> C.Splice n)
splicesFromTutorial = mapS (C.pureSplice . C.textSplice) $ do
"tutorialTitle" ## title
"tutorialURL" ## url
"tutorialAuthor" ## author
Remember that title, url, and author are functions defined in our Tutorial type. So our do block contains a value of type Splices (Tutorial -> Text)
. We then map over those splices to create pure splices from each.
If this all sounds a little heavy, don’t panic! It takes some time working with functions in the Heist.Compiled module to build fluency. No amount of explanation is going to make the reason for this immediately clear; it’s simply one way we can leverage the higher level compiled splice functions we have available to us.
But you should make an effort to follow the types as we go, even if only in the abstract. Here are the type signatures for the Heist functions we used above:
textSplice :: (a -> Text) -> a -> Builder
pureSplice :: Monad n => (a -> Builder) -> RuntimeSplice n a -> Splice n
mapS :: (a -> b) -> Splices a -> Splices b
In our case, the splices first contain a function of Tutorial -> Text
, which is passed to textSplice, giving us a function of Text -> Builder
, which is what pureSplice expects as its first argument.
The end result is a series of splices where node names map to functions of RuntimeSplice n Tutorial -> C.Splice n
. Compiled Heist gives us a few options for working with splices containing functions of this type. Here’s how we can map over a list of runtime tutorials and create a single compiled splice containing all of the rendered tutorial splices:
renderTutorials :: Monad n => RuntimeSplice n [Tutorial] -> C.Splice n
renderTutorials = C.manyWithSplices C.runChildren splicesFromTutorial
For posterity, here are the type signatures for the supporting Heist.Compiled functions used above:
runChildren :: Monad n => Splice n
manyWithSplices :: Monad n
=> Splice n
-> Splices (RuntimeSplice n a -> Splice n)
-> RuntimeSplice n [a]
-> Splice n
It’s a lot to take in, but follow through step by step to see that everything lines up.
Now we have a way to process a runtime computation returning a list of tutorials, create individual tutorial splices for each tutorial, and return it as a single compiled splice. This is a very important point that gets to the core of compiled Heist: we can reuse splices (and thus nodes in a template) however we want, as long as we compile them down to a single splice this way.
We can then create top level splices that will map the outer <allTutorials>
node to this compiled splice:
allTutorialSplices :: Monad n => Splices (C.Splice n)
allTutorialSplices =
"allTutorials" ## (renderTutorials tutorialsRuntime)
Once we have the fully compiled splices, we can add them to our HeistConfig so it will be available to our template when rendered:
app :: SnapletInit App App
app = makeSnaplet "app" "A snap demo application." Nothing $ do
h <- nestSnaplet "" heist $ heistInit "templates"
-- add the compiled splices to our HeistConfig
addConfig h $ mempty { hcCompiledSplices = allTutorialSplices }
-- the rest of your SnapletInit
At this point all that remains is rendering the template in a Snap handler:
tutorialHandler :: Handler App App ()
tutorialHandler = cRender "tutorials"
Notice again that unlike interpreted splices, we don’t (and can’t!) provide local splices specific to this template. When our handler renders the template, those splices will be automatically found in our HeistConfig.
The above walkthrough will hopefully give you enough insight to get started, but check out the snap-heist-examples repo for a complete working version with all of the required imports, other examples, and a cabal file listing the library versions used here.
It’d be nice if I could tell you to start with interpreted splices on your next project and only move to compiled splices when you need extra speed. I’m all for keeping things simple and avoiding premature optimization, and interpreted splices are plenty fast for many use cases.2
What gives me pause is that compiled splices give you a dramatic performance improvement without much extra effort, provided you plan for them in the beginning. This extra effort isn’t a bad thing either: it forces you to really think through how you obtain data and expose it to templates at the application level, whereas interpreted splices make it a little easier to play fast and loose with splices that can change locally depending on the template and particular view.
Compiled splices only introduce one major caveat: they won’t stop you from declaring splices with the same node name, and it will happily let you overwrite duplicate values.3 Let’s say you make two different compiled splices for a “userName” node used in separate templates, and put both in your Heist config. One of them will be silently overwritten, and the value it returns could be used in both templates.
I can think of a lot of ways this could be very dangerous (say, accidentally displaying every user’s account on an individual user profile page because you used the same node name for both). I do not think this is a likely accident, but you should definitely take precautions to ensure your Heist config doesn’t contain any surprises. Hopefully at some point in the future we’ll get a way to specify compiled splices for particular templates so we can explicitly control this behavior.
I updated my snap-heist-examples repo with comparable compiled versions of the original interpreted examples. It’s not a bad place to start if you want to see Heist used in the context of a Snap application, and it should be relatively straightforward to clone the repo and build the app locally if you need a playground for learning Snap and Heist.
Here are some additional resources for learning more:4
1. Details available in the original announcement.
2. We often talk about speed in relative terms as if it’s meaningful, but it’s not. Unless you benchmark and know what your expected load is, you really can’t rule out interpreted splices on the grounds that they “aren’t fast enough” for you, even though it’s tempting.
3.The SpliceAPI module exports a “#!” combinator that is similar to “##” but throws an error if there is a duplicate.
4. If you write a Heist tutorial and would like to add it to the list, open an issue or send a pull request.
]]>But first: what’s in a template?
My experience with server-side templates has been heavily influenced by popular python libraries like Mako, Jinja 2, and Chameleon.
The former two fall firmly into the programmable category, meaning you can use a specialized syntax in the templates to express programming logic along with your markup:
<!-- mako example: displaying a table of active users -->
<table>
% for user in users:
% if user.active:
<tr>
<td>${user.name}</td>
<td>${user.email}</td>
</tr>
% endif
% endfor
</table>
Which, if you’re both programmer and designer, works pretty well most of the time. However, some find this approach… distasteful. An alternative is a TAL (Template Attribute Language) like Chameleon, where you embed logic in tag attributes so you can still enforce proper markup:
<!-- chameleon example: displaying a table of active users -->
<table>
<tal:repeat="user users">
<tr tal:condition="user.active">
<td>${user.name}</td>
<td>${user.email}</td>
</tr>
</tal:repeat>
</table>
This gives us cleaner markup, but the logic is still embedded in the template.
Heist takes an even more extreme view: no control flow or logic in the templates. This may not be entirely accurate (it does have a couple of basic constructs built into the templating language, such as bind and apply), but compared to our other examples it’s a whole new world of template purity.
Our previous user example might look like this:
<!-- heist example: displaying a table of active users -->
<table>
<activeUsers>
<tr>
<td><userName/></td>
<td><userEmail/></td>
</tr>
</activeUsers>
</table>
We can now write plain old Haskell code to:
<activeUsers>
<activeUsers>
nodeOne helpful way of viewing interpreted Heist is that it’s not so much a templating engine as a library for manipulating templates.2 An API for taking a template apart node by node and putting it back together again, optionally splicing in dynamically generated elements or text. In fact, a Heist template is literally a list of nodes:
-- a Node is an element in a Document from the Text.XmlHtml library
type Template = [Node]
Here’s the supporting code to bring it all together:
-- binds a list of splices to <activeUsers> (assumes we pass in active users)
activeUsersSplices :: [User] -> Splices (SnapletISplice App)
activeUsersSplices users = "activeUsers" ## (bindUsers users)
-- maps over a list of users to create splices for each
bindUsers :: [User] -> SnapletISplice App
bindUsers = I.mapSplices $ I.runChildrenWith . userSplices
-- creates the <userName/> and <userEmail/> splices for an individual user
userSplices :: Monad n => User -> Splices (I.Splice n)
userSplices (User name email) = do
"userName" ## I.textSplice name
"userEmail" ## I.textSplice email
Once you embrace this view of Heist as functions for manipulating templates, the next task is learning the libraries. Here’s the high level breakdown:
Library | Use |
---|---|
Heist.Interpreted | API for splices interpreted at runtime |
Heist.Compiled | a slightly more complicated API for (more efficient) compiled splices |
Heist.SpliceAPI | handy syntactic sugar for working with splices |
Snap.Snaplet.Heist | convenience functions for accessing Heist state in a Snap application |
Choosing a template engine is kind of like choosing a text editor: everyone’s sure their approach is best, and sooner or later you’ll be dragged into silly arguments.
With programmable template engines, people are often quick to mention how we need a clean separation of concerns to keep business logic from ruining our pristine templates, and won’t you think of all the poor designers out there who just want to work with valid markup.
They sometimes neglect to mention that for some teams, expressing logic in templates is a benefit (it can clarify intent, and may be preferred when programmers are solely responsible for integrating markup), or that designer preferences vary. I have worked with designers that only hand off static assets and require the developers to handle 100% of the integration, and I’ve worked with designers that take the time to learn enough of your chosen framework and templating system to work with it.3
The bottom line is we’re discussing matters of taste and preference, so there is no right answer. It depends on the context and how well it’s going to work for everyone involved on the project.
If your preference is designer friendly templating systems free of dangerous magic and unclean business logic, Heist is the go-to Haskell template library for you. But it’s not my own typical use case, and it’s not the grounds on which I’d recommend choosing it.
The payoff for me turned out to be much more subtle: you can write more Haskell. You don’t have to find a way to express what you want in a specialized template language. You can take full advantage of the language, its type system, GHCi, your tricked out text editor, etc.
This approach can require more code if you’re used to the convenience of programmable templates, but it also forces you to be more conscious about how you’re manipulating data and exposing it to templates. And at the end of the day, you’re writing Haskell: if you find yourself writing boilerplate, there’s probably an abstraction you can use to DRY it up.
I have a bad habit of making my learning process public. In this case, I worked through some control flow basics in Heist (using interpreted splices), and wanted to share. You can view the snap-heist-examples repo to see standalone Snap handlers that demonstrate different ways to repeat or conditionally include text and templates.
Contributions or issues/ideas are very welcome.
1. In lieu of saying “down the rabbit hole” again, a phrase I repeat far too often. I expect I’ll continue to use more and more obscure variations on that theme. Steel yourselves.
2. It’s actually more powerful and subtle than that: you can use the library to implement your own domain-specific markup languages.
3. If you know of any actual studies on designer preferences, please send details to eric @ chromatic leaves dot com.
]]>Death’s final studio album is one of the all time metal classics. On The Sound of Perseverance, Death took their brand of technical progressive thrash death metal to the next level. This is death metal evolved.
Most tracks open with sparse instrumentation. Deep bass lines, drum solos, lead guitar melodies, welcoming you to a new soundscape before the rest of the band appears. Before the thick guitar and bass riffs take over, the drums pound relentlessly, the scratchy vocals and lyrics invade your consciousness. Soon all of the instruments are building and pushing and driving to create an oppressive atmosphere, making it harder to breathe, harder to think, a deadly wall of sound compelling and propelling you deeper into their vision.
Just when you can’t take any more, it stops. A heroic melody swoops in to the rescue, letting you soar through the world they’ve created. Soon you’re surrounded by glitchy high-speed guitar solos, unexpected drum fills played with impossible accuracy, bass and rhythm guitars carrying you further and further inward until there’s nothing left but you and the music. But you have only a moment to reflect before another chaotic shift in tempo leaves you stranded in the fray, the frenzy of chugging riffs and blast beat drums.
Usually when we talk about progressive metal we mean metal with classically influenced melodies and harmonies, sweep picking and scales, and the many other ways skilled musicians have learned to show off their skills. You won’t get that here. You’ll get the sound of a band that evolved naturally, forever in debt to the heavy metal and thrash that preceded it, but carving ahead well into uncharted territory.
Death’s magic is making you a part of their journey. It’s tragic that they never got the attention they deserved1, and far more so that lead guitarist/songwriter Chuck Schuldiner passed away at 34. If you missed out on The Sound of Perseverance for any reason before, now’s the time to get it. This metal is just as relevant today as it was in 1998.
1. According to the WikiPedia entry, The Sound of Perseverance originally saw about 34,000 copies sold in the US, vs. hundreds of thousands of record sales for popular Cannibal Corpse, Deicide, and Morbid Angel albums.
]]>Many inflammatory posts and twitter arguments have framed these camps as Types versus Tests, but the two aren’t mutually exclusive. If you’ve read my post on Making Code Reasonable, you may correctly guess that I prefer types, but I write tests (albeit for different purposes) either way.
Proponents of dynamic languages1 are frequently taught to solve problems in ways that can only be checked with tests, and this particular style of problem solving is one of the fundamental disconnects between the types and NoTypes crowds. Ask one of these people (including me!) how often they’ve had to write extra unit tests to make up for the lack of a good type system, and you’re likely to be met with a confused look and a “why, never!”
It’s true that you won’t find many tests in Python or JavaScript where the programmers are explicitly inspecting the types of objects and secretly wishing they had static typing, but this is missing the point. The benefit of static typing isn’t about enforcing the kinds of simple relationships that you wouldn’t test anyway, but expressing richer interactions that you can check with the compiler instead of a test suite.
Recently I had a somewhat frivolous project idea: a command-line program to help me learn the Dominic System (a technique for increasing memory skills). An explanation of the Dominic System and the program, hsmemoryquiz, are available on GitHub. We’ll look at some of the benefits of elevating data and abstractions to the type level.
The Dominic system is based on a mapping of the digits 0-9 to the letters O, A, B, C, D, E, S, G, H, and N. This foundation gives you the building blocks for working with all possible pairs of digits (00-99) and pairs of letters (OO-NN).
In many languages it would be practical (and expected) for you to model this data with the primitives for integers and characters. But if you enjoy obsessing over failure points in your program, this is unacceptable, because it means that every function or method using these values would need to account for the possibility of numbers outside the range 0-9.
Short of hideous, sprawling code with maddening error checking at every turn, it’s much more practical to define entry points for validating input before passing it to the underlying functions. You can then narrow the scope of your tests to those entry points and hope for the best.
But if we step back for a moment, we should be asking whether or not we need the full power of integers, characters, strings, and all of the libraries and built-ins capable of manipulating them.
Spoiler alert: we don’t!
We can create new data types that contain only the values we need:
data Digit = Zero | One | Two | Three | Four | Five | Six | Seven | Eight | Nine
deriving (Eq, Enum)
data DigitPair = DigitPair Digit Digit
deriving Eq
data Letter = A | B | C | D | E | S | G | H | N | O
deriving (Show, Eq, Enum)
data LetterPair = LetterPair Letter Letter
deriving Eq
In the Dominic system there is an exact mapping of Digits to Letters, and in the Letter module of hsmemoryquiz we’ll need a way to create Letters from Digits. We can write a function with the following signature:
fromDigit :: Digit -> Letter
This simple declaration gives us powerful reasoning tools:
Now we can operate with complete confidence2 that the function does what we expect, and does so without affecting other parts of our system. We’ve succeeded in pushing the need for validation further out, allowing us to write a more robust core that doesn’t need to consider the possibility of bad input (and if anyone tries, the program won’t compile).
Inevitably we will need to face the outside world, and types afford us many tools for combating bad input. In imperative languages, it’s common to ignore certain kinds of troublesome input and instead throw exceptions when things go awry. This is a pattern that is convenient to write, but complicates the flow of our programs. There is an added mental overhead in having to know which exceptions may be thrown and where they may or may not be caught.
Often we can obviate the need for exceptions by returning values that indicate some failure condition instead. The problem here is that if you have many values that work this way, you can end up with long, complicated code blocks. Let’s look at an example in python where any of the arguments may be a legitimate value or None:
def build_registry(foo, bar, baz):
if foo is not None:
if bar is not None:
if baz is not None:
return Registry(foo, bar, baz)
return None
Now you can see why exceptions are so appealing here! It’s much simpler to try to make an instance of Registry and ask for forgiveness (in the form of a try/except block) than it is to constantly validate input. In many languages and frameworks the notion of an empty or bad value may vary as well, requiring you to sometimes check for null, undefined, empty strings, lists with a length of 0, etc.
What we’re really missing in these languages is a way to express values that may be more than one type. In Haskell we can achieve this with algebraic data types. One of the canonical examples is:
data Either a b = Left a | Right b
We can use this to unambiguously signify error conditions with the Left constructor and valid values with the Right. This would even allow us to define a concrete type Either String String and reliably differentiate the two cases without resorting to string matching, checking for null values, or checking for an empty string.
More importantly, we can use this as a basis for richer types that carry the notion of success or failure cases with them, rather than requiring the use of exceptions. Here’s an example from the Game module in hsmemoryquiz that runs a continuous quiz game:
playGame :: Quiz ()
playGame = do
assoc <- nextAssociation
res <- playRound assoc
case res of
Continue -> playGame
Stop -> return ()
The Quiz monad stack includes ErrorT, which means that any time we run a computation in the Quiz monad (in this case, the first two lines in the do block), the value returned may be either an error or a valid value. There’s no need to alter the flow of the program or nest a long series of conditionals, because the types extracted from Quiz computations already carry that notion of failure with them. If the nextAssociation function is unsuccessful (i.e. it returns ErrorT’s Left case), then the playRound line will not be evaluated, and the entire block will evaluate to that Left case.
The function that runs the game can then pattern match on the final value to differentiate the two cases:
runGame :: Registry -> IO ()
runGame registry = do
putStrLn "Welcome! Quit at any time with \":q\" or by pressing ctrl-c"
res <- runQuiz registry newQuizState playGame
case res of
(Left e, q) -> putStrLn $ formatError e q
(Right _, q) -> putStrLn $ formatSuccess q
Although I am very certain all of you will want to dedicate hundreds of hours to learning obscure memory techniques and practicing them with my program, the real motivation behind hsmemoryquiz was creating a fairly straightforward example of a Haskell command-line utility with several nice touches:
There are of course plenty of great resources out there for learning Haskell, and this isn’t intended to be a canonical example of How to Write Haskell; there are much better and more interesting Haskell programs3.
But many full-featured utilities and programs are not written with beginners in mind. If you find yourself writing a lot of smaller utilities or single-file Haskell examples but haven’t quite taken the next step, I hope this will help you on your way.
1. “Dynamic” being a somewhat contentious term, used here to roughly mean “types that are checked at runtime”
2. Modulo the usual caveats (unsafePerformIO, error, non-termination)
3. A short list of programs that have inspired me: xmonad, hlint, hoogle, hakyll
]]>