Whiskey Data Sleuthing, with Help from Reddit
David Wishart, the Nate Silver of whiskey tasting.
The data file on the University of Strathclyde page was completely unsourced, leaving a lot of open questions:
- Who compiled the data and when?
- How were the flavor categories chosen, and how were the scores tabulated
- What exactly do the flavor categories mean?
- Are the scores an average for all the output of a given distillery, or are they for one representative bottle?
- Why is the rather sketchy/spammy whiskyclassified.com listed as the only reference?
I went to /r/scotch with my questions, and within an hour they set me on the right path. Redditor “howheels” did some domain research and found that whiskyclassified.com changed hands and entered its current spammy incarnation in April 2013. Prior to that it was a promotional site for a book, Whisky Classified: Choosing Single Malts by Flavour. Written by David Wishart of the University of Saint Andrews, the book had its most recent printing in February 2012.
You can see the original site for Wishart’s book using the Internet Archive. The most current version, however, seems to have migrated to Saint Andrews, where among other things you can find a fairly detailed methodology for how the flavor scores were arrived at. Bingo!
I’ll quote at length, because it’s interesting stuff. Wishart started with tasting notes from ten different previously-published books. The man was an aggregator before aggregating was cool - a Nate Silver of whiskey tasting.
Most distilleries produce several brands that are differentiated by length of time in cask, special conditioning or finishing, e.g. to impart flavours such as oak, sherry, port or Madeira to the whisky. As our objective was to develop a classification of malts that are readily available to consumers, we felt we should select a benchmark malt whisky from each distillery. We firstly excluded rare malts and any premium brands that are specially aged, cask conditioned or finished. We also decided not to cover distilleries that had been demolished or are not currently in production.
Not all of our 10 authors reviewed the same distillation from each distillery, as some limit their tasting notes to house style only (e.g. Milroy (1995)). Where more than one distillation is produced we selected the most widely available brand, usually of 10-15 years maturation in cask. New distilleries that currently offer young malts (Arran and Drumguish) were included for future reference, as they evolve. Vatted malts (blends of pure malts), and malt whiskies produced in Ireland, Japan, New Zealand and Wales were excluded. We thus arrived at 86 single malt whiskies of around 10-15 years maturation, most of which are widely available in the U.K.
This is key - the scores are based on one representative, commonly-available whiskey from each distillery, not an average range. This obviously elides any differences between a distillery’s products, but on the other hand you don’t have to worry about the score for any given distillery reflecting some arcane bottling that you’ll never hope to find.
A vocabulary of 500 aromatic and taste descriptors was thus compiled from the tasting notes in the 8 books. These were grouped into 12 broad aromatic features: Body (Light-Heavy), Sweetness (Dry-Sweet), Smoky (Peaty), Medicinal (Salty), Feinty (Sulphury), Honey (Vanilla), Spicy (Woody), Winey (Sherry), Nutty (Oaky-Creamy), Malty (Cerealy), Fruity (Estery) and Floral (Herbal).
The 12 flavor categories are condensed from 500 different descriptors used by the original authors (not sure why he says 8 books here?). This might have been more of an art than a science - one man’s ‘smoky’ is another’s ‘peaty’ - but a necessary one.
Similar to the Revolution Analytics blog post, Wishart does some cluster analysis to arrive at 10 different flavor groupings. There’s a lot more detail on his methodology page that I won’t get into - be sure to give it a read.
So it looks like we’ve answered our initial questions about the data. Finally, if you poke around the site you’ll find a link to a downloadable Windows program Wishart wrote called Whiskey Analyst. Meant primarily as way to record tasting notes on individual whiskies, it also contains a text file of Wishart’s notes and scores. The data are in an idiodyncratic format (one column, 11,000 rows, with groups of rows for each whiskey) so I can’t be certain, but it looks like there might be a lot more detail here than in original dataset - multiple products for each distillery, rather than one representative sample. This would definitely be a fun Processing/R project for someone to go through and convert it to a standard .csv - have at it!