Oct 072013

When taking on an analytics project, or designing a reporting system (dashboard or otherwise) a core component to superior execution is to properly understand the question(s) the vehicle is expected to answer.  This may seem like an obvious statement, but it is amazing how often the metrics of focus are done so for convenience rather than impact. Additionally, dashboards and reports are often (at least initially) put together by individuals with little training in design and business reporting. A monkey can make a graph, but it takes a bit of thought and planning to make something impactful.  I would argue that the state of business intelligence in general suffers from this issue – people undervalue the opportunities for using data to make great business decisions because they have learned that the data available to them is not useful for doing so. Instead of insisting that the metrics and reports be useful for business decisions, they instead write off the full potential of the data and go back to the inefficiencies of traditional gut based decision making. What they fail to realize or are not made aware of, is the wide variety of data available that is not being used. Empowering decision makers to utilize data is a core purpose of an analyst.

While assigning fault for the poor state of BI affairs isn’t particularly helpful, it’s worth noting that it is a systemic issue based on the past delivery of inferior metrics and reports coupled with limited decision making timeframes. This can also be compounded by the general ignorance of what data is available. The analyst’s job is to right these wrongs and must retrain the organization around how data is approached, assembled, and utilized for business decisions. This reorganization begins with the most basic step in the analytics chain: Defining the right questions.

The reason that the question definition is key is because all further analytical planning actions stem from it. Question definition, while the job of the analyst, requires input from the consumers of the information. Interviewing those who will use the analysts’ output is key to deciding what any given analytical product will contain including how many metrics are needed, in what format, and how frequently they will need to be updated.  The unsophisticated organization derives metrics by default, in a post-hoc manner based around whatever data has happened to be collected. This approach is not likely to have the impact that a more carefully planned set of information tailored to the business needs will.

Additionally, some decision makers will believe they know exactly what data they want and need, and it is important that the analyst probe and make sure this is correct. Finding out what a particular metric will be used for, or why a specific piece of information is useful can uncover unintended interpretation mistakes (e.g. when a unique-id is not indicative of a unique user due to cookie expiration).  It is safe to say that while business owners often do not understand the data used to create particular metrics, they often have strong opinions about what the metrics mean. It is the job of the analyst to educate and redefine based around these shortcomings. Furthermore, the analyst should be aware of the myriad of data sources that are available for creating metrics from, helping to aid the business owner through discovery. This is a major reason it is critical to get the BI consumer/business decision maker to talk about what the questions they are trying to answer are rather than to expect them to rattle off a full list of needed metrics. The analyst defines the metrics based on the needs of the business owner. It is crucial that the analyst take an active, participatory role in the design stage rather than a passive “fast food order” style of design. You are a data educator – act like one.

In closing, there are a number of additional steps to defining the mechanism necessary for answering the questions settled upon. Some questions require multiple metrics, new data collection, derivative metrics (e.g. time between purchases is derived from last purchase timestamp minus previous purchase timestamp), or a re-statement of the question due to limitations in available data. This layer of the design lies in between the initial question definition and the visual display of data, but again is key to successful execution of an analytic output. The analyst is an artist, a teacher, a designer, a data counselor, and a storyteller as well as being the one who knows the data. You can’t design an output mechanism if you don’t know what the input question is.

Question –> translates to data –> summarized by metrics –> interpreted by analysis –> converted to display –> answers or informs question

Jul 302012

A friend sent me a great blog post (see #1 in the list at the end of this post) around testing that has been buzzing around (and should be read and debated if you care about such things even a little bit). The post introduces (as in “brings to our attention” not as in “invents”) a method of easily coding an epsilon greedy strategy for testing web (or whatever) optimization and claims that it is superior to the well-established standby of  A/B testing (oooh, thems fightin words!) This post has inspired a number of responses by folks who run A/B tests, folks who optimize and test websites, and computer nerds interested in arguing about this type of stuff in general.

The normal array of folks weigh in – the engineers who almost understand* the experimental/statistical approach to A/B testing, statistician purists who sing about the superiority of A/B testing, and the practical coders who just ask for something that is simple, intuitive, and works towards the end goal of making money on their site. It’s the interplay between these folks that I found interesting (and entertaining). Full disclosure – I’m not a master statistician but have plenty of experience in experimental design, A/B testing, and general statistics. I am not by any stretch a website optimizer or an engineer.

At the end of the day, for the majority of those who are interested in testing – and I’m not talking about Google or Bing but rather the rest of the world, they want something that works very well and that converges on an optimal or close to optimal solution quickly. Maybe we should even get fuzzy with the term “optimal” by saying it means exacting maximum price/conversion/experience AND painlessness/stability/implementation ease. The main arguments against the A/B testing framework is that while it is a well-established, experimentally sound methodology, it takes time to execute and collect the data, requires knowledge of statistics to accurately interpret (not to mention know how long to run the test, how to set up the test, and to understand why you don’t just “watch the numbers until it gets significant”) and needs to finish before you can tweak and deploy your “winning” solution. The epsilon greedy algorithm is relatively self tuning based on the rules given to it, making it get closer to an optimization (assuming a static universe) relatively quickly**. One big argument against the epsilon greedy strategy is that it can mistakenly optimize based on a temporal correlate or something similar (e.g. your site shows breakfast and dinner ads at 8 a.m. – guess which one gets more clicks? That ad is then considered optimal and keeps getting shown past noon, into the evening until, finally, the dinner ads get enough clicks to flip it – but way later than is optimal). Maybe some good strategies are to reset/retest every X hours, or to decay older clicks against newer ones for a faster “flip” when such a pattern emerges.

My take is that if you don’t care about the statistical rigor or experimental soundness angles to optimization – and if you aren’t particularly keen on analyzing every tweak, covariate, and adjustment you make to your website (again, excluding Google and the big boys here who definitely do care), then the epsilon greedy algo is worth implementing. That is not a dig by any means – sometimes you care and sometimes you want this thing to get rolling as fast as possible, you will never revisit it, etc. If you are trying to sell a service to scientists or stats nerds, need the experimental rigor, need to optimize for the long term, expect to run tests at all times, and want to use as much data as possible to craft the ultimate presentation models (or whatever it is you are testing for) then you should probably be using the slower-but-steady A/B testing approach or something similar. As with most things – use the tool that meets your goals, but figure out your goals ahead of time for optimal tool selection.

In the end, I feel like the debate around the method to use consists of folks who are discussing and exploring some of the pros and cons of each approach without enumerating the actual uses. They mistakenly assume that both are used for exactly the same reason. While this is, at the broadest level (optimizing between choices) is true, the actual reasons behind the use of one over the other is vastly different. Hatchet versus lathe – both cut wood, but which one does it better?

* I say “almost” because in the discussions, many of them fail to point out simple errors others are making in assumptions. If they were statistically savvy engineers they would say things like “you should reset your test metrics every time you make a change, and this is true whether you use the epsilon greedy strategy or A/B testing”.

** I’m ignoring the cases where the groups have disgustingly similar conversion rates.

Here are some articles and reference pieces for your perusal:

  1. Original post “20 lines of code that will beat A/B testing every time
  2. Wikipedia article on Multi Armed Bandit problem  and the concept of the epsilon greedy algorithm
  3. Blog on why Bandit algos are better than A/B testing
  4. Discussion and debate around the original article on y-combinator
  5. Blog on some hard-knocks and learning around A/B testing in general
  6. Blog summarizing a cool article by Microsoft Bing folks on weird outcomes experienced with online testing:
  7. Actual MSFT article on weird A/B outcomes explained (PDF)
Apr 032011

I work in the mobile industry, and as a business analyst, I read a number of articles each week on mobile trends and predictions. These articles may discuss handset popularity, distribution of phones by country, data consumption, web usage, application purchases, or any other of a million broad or niche topics. One thing that I have not seen a lot of is independent collections of information. Often the stats I see are either teasers by analysts trying to get you to buy their reports, or marketing documents written to sell someone (not me) some product to help their mobile business. In doing a little research for a whitepaper I’m writing, I ran into this awesome collection of data by the guys at mobithinking on mobile trends that was released in February of 2011 (< 2 months ago)!

Not only is it a super long list describing several types of trends around consumption, change, applications, platforms, etc. They include many links to the original sources of data and even multiple sources for the same factoids (sometimes the sources agree and sometimes they do not). Lastly, mobithinking folks make a number of comments about the data they report, adding their expert opinions around the facts and figures they present.

I found this webpage to be an awesome collection of useful information for understanding where the mobile market is heading, what the predictions for the future are (revenues, OS wise, etc.), and I will be using this as a go-to when I need future factoids to round out certain business arguments or to set the stage for presentations.

The most interesting bit I saw (and I did not read everything….yet) was on the estimation of when US mobile carriers, if they don’t change course, will cease being profitable (original article available as the “executive summary” which at 8 pages long seems about 7 too many for most executives, here). Spoiler alert: Unprofitability is coming more quickly than you probably think it is.