Dec 022013

While the information gathered around the functioning and user interaction with a product or service can be priceless to multiple organizational parties for making key business decisions, data is often approached from the point of view of a secondary by-product (exhaust), or as secondary to basic product functionality. In my experience this occurs for a few reasons:

  • Data collection is not necessary for a product or service to function
  • Data collection has no positive effect on real time product performance
  • Data collection often reflects a complex mixture of needs from multiple internal parties, muddying ownership and responsibility


As such, due to market pressure and a lack of data advocacy, products and services can launch and get initial traction with no data collection mechanisms whatsoever. This creates a false sense of hierarchy regarding the place of data oriented development within the product. In an agile framework especially, this can be a severe impediment to getting a robust data collection system in place due to constant ranking below-the-line of assigned man-hours unless data collection (and its dependencies) is perceived as having equal footing alongside high ranking product features.


At heart, I am a philosopher, and I encourage business owners to humor me with their own philosophies (or visions) around the product they are creating or the organization they are in charge of. This exploration allows me to understand their overarching positions and expectations and gives me opportunities to design data strategies (i.e. collection, analysis) towards meeting those stated expectations and, most importantly, the unstated future expectations. It is this approach that lets me meaningfully rank order system requirements, anticipate pivots, and produce analytic output in anticipation of its necessity.


System level thinking around a company vision drives the expectations and needs of business intelligence and also gives solid reasoning around what is needed in each phase of a product lifecycle. A product is derived from the expectations of how to solve the identified problem in the market.  The role of data lies in providing information regarding the extent to which the problem exists, whether it is addressed and solved or lessened by the product functions, and whether the nature of the problem changes over time. By defining data needs from the vision level, systems can be designed (green field, retro-fit, refactor, or otherwise) with the flexibility to grow comfortably into the realization of the vision as the product matures. A focus on providing narrow intelligence for an initial product, or relying too heavily on a reporting template is a sure path to mediocrity and sub-par output.


The vision, when properly evaluated, gives key insight into the minimally viable BI product, the importance of particular kinds of data over others, the importance of data quality for each data source, the extent to which a product must be instrumented, key milestones in the creation of a mature data system, and a number of other basic needs for a successful system of analytics. In other words, the enunciation of the vision allows for an appropriate data strategy to be created and implemented alongside product features. This allows multiple data consumers the opportunity to engage with data throughout the product lifecycle, promotes data-centric thinking amongst owners, and helps engineers and other builders to design systems that include robust data pipelines. It changes the conversation between a data team and engineering from “do you want me to collect that?” to “how do you want me to collect that?”

Oct 072013

When taking on an analytics project, or designing a reporting system (dashboard or otherwise) a core component to superior execution is to properly understand the question(s) the vehicle is expected to answer.  This may seem like an obvious statement, but it is amazing how often the metrics of focus are done so for convenience rather than impact. Additionally, dashboards and reports are often (at least initially) put together by individuals with little training in design and business reporting. A monkey can make a graph, but it takes a bit of thought and planning to make something impactful.  I would argue that the state of business intelligence in general suffers from this issue – people undervalue the opportunities for using data to make great business decisions because they have learned that the data available to them is not useful for doing so. Instead of insisting that the metrics and reports be [more]

May 132013

There are a number of often unspoken jobs that the analyst must perform in order to be useful, helpful and thereby successful.  These include being a question definer, a translator, a data expert, a storyteller, and an anticipator of post analysis questions. Without approaching a business question (which should inform a specific business action) carefully, the opportunity for error or underwhelming findings is greatly increased. Definer of THE QUESTION – When I think about analysis I always start with “what is the question we are trying to answer?” The question to be answered is never as simple as whatever my boss has asked me; data is messy, the question is complex, and more often than not, the initial question is wrong. By wrong, I mean that it is too general, makes assumptions about the answer that may be wrong, or just does not make sense from a business point of [more]

May 062013

In business, as in life, there are a number of unknowns that we constantly have to notice, interpret, and react to. A superior strategy includes prediction or some sort of range of expectation(s) for key actions (e.g. product releases, announcements), the accuracy of which are the result of factors like good intuition, observation, and pattern recognition. As such, the strategic minded individual should be collecting information from multiple sources in order to make the transition from expectation to outcome as seamless as possible as often as possible. There will be colossal misses, due to misinterpretation of given data, failure to collect appropriate data, or a high degree of chance in the actual outcome. Example: The cult film Donny Darko was released to major theaters around the time of the September 11 attacks. Part of its failure as a major release has been attributed, post-hoc, to the particularly sensitive aspect of [more]

Jan 022013

The data buzz-phrase of the current century, “Big Data”, is often approached as a magical construct that one might lash themselves to and, like Odin to the Yggdrasil, walk away with great knowledge after a time – maybe just by being near it. The idea being that using this toolset is THE way for extracting value from your data. I’m not the first to say it, but this is similar to how relational data bases have been sold for years, only now the promise extends out to unstructured and semi-structured data. Pro tip – you still have to manipulate the data to get anything worthwhile from it, and that assumes you collected the right stuff to begin with.   It’s unfortunate that a lot of people in the organizational position to make investments in data infrastructures, technologies, and tools get stuck playing a game of mad libs instead of figuring [more]

Nov 252012

So Nate Silver is the stats nerd of the year for his great (or lucky, if you hate science) methodology around poll aggregation and the poll weighting algorithm he employed regarding the prediction of the outcome of the recent national elections. Congratulations Nate, if I didn’t live in a country with Byzantine banking laws, I would have made a tidy sum using your leg work (among others – I firmly believe in leveraging the wisdom of crowds of experts) to invest on “Obama to win” via the event based market InTrade. I haven’t been able to find any apologies by the demonizers who suggested Nate was just another political hack (like them?) who was rooting for the wrong team and trying to hide it behind some sort of magical thinking in the guise of science, but I can’t say I looked too hard. While the disappointing part of the whole [more]

Jul 302012

A friend sent me a great blog post (see #1 in the list at the end of this post) around testing that has been buzzing around (and should be read and debated if you care about such things even a little bit). The post introduces (as in “brings to our attention” not as in “invents”) a method of easily coding an epsilon greedy strategy for testing web (or whatever) optimization and claims that it is superior to the well-established standby of  A/B testing (oooh, thems fightin words!) This post has inspired a number of responses by folks who run A/B tests, folks who optimize and test websites, and computer nerds interested in arguing about this type of stuff in general. The normal array of folks weigh in – the engineers who almost understand* the experimental/statistical approach to A/B testing, statistician purists who sing about the superiority of A/B testing, and [more]

Apr 132012

In dealing with thoughtful people who have either never taken a stats class or have not been students of the discipline for many years, I have often run across a few interesting false premises or areas of ignorance. One worth mentioning in passing (and this falls into the paradigm of a little knowledge being a bad thing) is the belief that to run any kind of experiment you can only manipulate one variable. This belief falls squarely in the “I’m going to tell you something right now to help you understand a general concept, but later you will be told to forget it because it is false” camp, but since most folks only take that first stats class they never get to the “it is false… and here is why”. For those of you in that camp who are waiting for me to explain how this insane proclamation can be [more]

Dec 182011

When I was in graduate school hustling to get my dissertation data in order, it took me over a year to collect everything I needed. Granted, the research took place via both an alumni survey and in a social-services organization with each posing its own set of challenges.  Once I had the data it was entered into spreadsheets and loaded into SPSS and AMOS whereby I ran my analyses after a bit of cleaning. Today in my work we have an A/B testing platform that allows certain experiments to be run by changing configuration files in real time on the live system. For those of you out of the loop, A/B testing is the term used for running split test experiments (traditionally in marketing).  For me it’s a tool to manipulate a very small percentage of user experiences in order to measure the impact of potential system-wide changes (visual or [more]

Sep 262011

First off, read this. So Netflix says to the SEC that churn is not important to them. Except that they didn’t actually say that. They said “the churn metric is a less reliable measure of business performance, specifically consumer acceptance of the service.” meaning that the metric, for them, is broken and therefore should not be used to compare them to others in the marketplace. The cynic would respond “what are you hiding?” but the truth is that they are correct: in their business, churn is so different that trying to compare it across companies is a disservice to the naive public. The information would be misconstrued and therefore should not be revealed. I generally am of the mind that you let the consumer of information make the decision about the quality of information but here I am with Netflix – the consumer is likely to misuse and ignorantly, accidentally [more]