Jul 062011

I have an intern right now, which allows me the opportunity for a lot of teaching opportunities which I love. In order to give her a real piece of work to chew on and master, I handed her the analysis of a new product my company has launched into the Android market.  It’s a mobile coupon aggregation service called Lotza, and the idea behind it is that with all the daily deal sights that are popping up everywhere like Groupon and Tippr, it would be nice if there was something that a) showed all the deals in one place and b) only showed the most likely deals that would be of interest and c) stored all purchased offers in a single wallet, for simple (virtual) retrieval. There’s a lot more to it in the development pipeline, but at beta, that is the core functionality. Using our analytics backend to power a direct-to-consumer product is a great opportunity for us to own the data we generate, and to experiment freely with layout, design, and analytic algorithms. Being the complete masters of our own system comes with all the accompanying responsibilities you might expect.

As per the norm, my Business Intelligence team drove product instrumentation and logging requirements to capture the user experience to the fullest. As is often true in software development, after all was said and done everything was not perfect (but was close) making the derivation of metrics less straightforward than simple aggregations across fields. Multiple tables with specialized information about certain aspects of a user profile, in session experiences, and behaviors make for a great multidimensional ball of potential confusion.

My intern had begun working with me at a time after these requirements had been written, after the schemas were defined, and after most of the logging was already implemented. To bring her up to speed I had her create a logging guide document showing every page in the product and all user actions possible per page along with the associated logging. In running actual tests on a phone, and tailing the logs live, she found several small bugs, proving to me that she was paying attention and was understanding the structure of the data (even though it was complex in parts). Once this document existed, a set of reports was defined for a presentation she is expected to perform to the internal stakeholders next week. This presentation will be a general overview of the product, and a discussion of usage so far including important KPIs such as sessions per user, the conversion funnel, and page abandonment rates. None of these metrics are as straightforward as they could be in a perfect product world, which is one point of this post. The important stuff is often under the initial layer of data, and requires special filters along with an intimate understanding of both what the logging requirements and definitions were, and how they have actually been implemented.

The second point is that the analyst has to know these nuanced methods for counting what would otherwise be a simple sum or count across unique values. The analyst who isn’t able to roll up her sleeves and dive into the (potential) logging morass is really trusting the insights derived to the gods of perfect logging. My intern has recognized anomalies, and worked around them. When she gives her talk next week, she will be armed with the ability to answer almost any question thrown at her – in other words, the “why” to her “what”. Even though this data wading has eaten up a lot of her time (she worked over July 4th and the 5th, a company holiday) she thanked me for the opportunity. She is confident in her findings, has learned a lot, and knows more about the logging system for our product than I currently do (and I wrote the logging spec and defined the original schema!). Her approach of looking at logs, reading documents (that are sometimes out of date), and tailing actual logs to identify examples and verify accurate data capture (and her own understanding of what is happening) represent the many hats I believe an analyst should wear. With so much data being generated every day, and the complexity of that data increasing, the analyst must be a data generalist. I’m not proposing that all analysts must be masters of SQL and statistics and technical writing and math and the art of visual presentation – I’m suggesting that the best analysts will utilize whatever tools they need (including engineers) to get the right data in the right format to the right people. It is my opinion that anyone so lopsided in their training as to only know one of the above mentioned skills is likely to underutilize what they find – the data will be inaccurate, confusing, mysterious, overwhelming. In short, it will be a disservice to their organization, and due to their focus, they might not even realize the problem. Come on analysts, diversify!

In closing: Data is messy. Roll up your sleeves. Question your results. Triangulate to verify. Cross-reference values. Perform sniff tests. As an analyst you should be the most engaged and knowledgeable person with the data you own – your consumers rely on you to play the role of a translator and to represent your confidence in the accuracy of your findings.  By projecting this confidence and integrity, those times when you find absolutely horrendous data, or unbelievable (in a bad way) information you can either find ways to salvage some useful information from it and/or proclaim with certainty that the data is, for the most part, so dirty that your analyses are unreliable and therefore not worth the effort. Your consumers may not always like the answers you provide, but they will respect you for declaring as much, especially when you know the data so well you can pinpoint the major issues that bring reliability into doubt. Of course, when you find amazing insights you can confidently present them (and show backup verification that you didn’t make some simple error – you know the data that well.)