Jul 062011

I have an intern right now, which allows me the opportunity for a lot of teaching opportunities which I love. In order to give her a real piece of work to chew on and master, I handed her the analysis of a new product my company has launched into the Android market.  It’s a mobile coupon aggregation service called Lotza, and the idea behind it is that with all the daily deal sights that are popping up everywhere like Groupon and Tippr, it would be nice if there was something that a) showed all the deals in one place and b) only showed the most likely deals that would be of interest and c) stored all purchased offers in a single wallet, for simple (virtual) retrieval. There’s a lot more to it in the development pipeline, but at beta, that is the core functionality. Using our analytics backend to power a direct-to-consumer product is a great opportunity for us to own the data we generate, and to experiment freely with layout, design, and analytic algorithms. Being the complete masters of our own system comes with all the accompanying responsibilities you might expect.

As per the norm, my Business Intelligence team drove product instrumentation and logging requirements to capture the user experience to the fullest. As is often true in software development, after all was said and done everything was not perfect (but was close) making the derivation of metrics less straightforward than simple aggregations across fields. Multiple tables with specialized information about certain aspects of a user profile, in session experiences, and behaviors make for a great multidimensional ball of potential confusion.

My intern had begun working with me at a time after these requirements had been written, after the schemas were defined, and after most of the logging was already implemented. To bring her up to speed I had her create a logging guide document showing every page in the product and all user actions possible per page along with the associated logging. In running actual tests on a phone, and tailing the logs live, she found several small bugs, proving to me that she was paying attention and was understanding the structure of the data (even though it was complex in parts). Once this document existed, a set of reports was defined for a presentation she is expected to perform to the internal stakeholders next week. This presentation will be a general overview of the product, and a discussion of usage so far including important KPIs such as sessions per user, the conversion funnel, and page abandonment rates. None of these metrics are as straightforward as they could be in a perfect product world, which is one point of this post. The important stuff is often under the initial layer of data, and requires special filters along with an intimate understanding of both what the logging requirements and definitions were, and how they have actually been implemented.

The second point is that the analyst has to know these nuanced methods for counting what would otherwise be a simple sum or count across unique values. The analyst who isn’t able to roll up her sleeves and dive into the (potential) logging morass is really trusting the insights derived to the gods of perfect logging. My intern has recognized anomalies, and worked around them. When she gives her talk next week, she will be armed with the ability to answer almost any question thrown at her – in other words, the “why” to her “what”. Even though this data wading has eaten up a lot of her time (she worked over July 4th and the 5th, a company holiday) she thanked me for the opportunity. She is confident in her findings, has learned a lot, and knows more about the logging system for our product than I currently do (and I wrote the logging spec and defined the original schema!). Her approach of looking at logs, reading documents (that are sometimes out of date), and tailing actual logs to identify examples and verify accurate data capture (and her own understanding of what is happening) represent the many hats I believe an analyst should wear. With so much data being generated every day, and the complexity of that data increasing, the analyst must be a data generalist. I’m not proposing that all analysts must be masters of SQL and statistics and technical writing and math and the art of visual presentation – I’m suggesting that the best analysts will utilize whatever tools they need (including engineers) to get the right data in the right format to the right people. It is my opinion that anyone so lopsided in their training as to only know one of the above mentioned skills is likely to underutilize what they find – the data will be inaccurate, confusing, mysterious, overwhelming. In short, it will be a disservice to their organization, and due to their focus, they might not even realize the problem. Come on analysts, diversify!

In closing: Data is messy. Roll up your sleeves. Question your results. Triangulate to verify. Cross-reference values. Perform sniff tests. As an analyst you should be the most engaged and knowledgeable person with the data you own – your consumers rely on you to play the role of a translator and to represent your confidence in the accuracy of your findings.  By projecting this confidence and integrity, those times when you find absolutely horrendous data, or unbelievable (in a bad way) information you can either find ways to salvage some useful information from it and/or proclaim with certainty that the data is, for the most part, so dirty that your analyses are unreliable and therefore not worth the effort. Your consumers may not always like the answers you provide, but they will respect you for declaring as much, especially when you know the data so well you can pinpoint the major issues that bring reliability into doubt. Of course, when you find amazing insights you can confidently present them (and show backup verification that you didn’t make some simple error – you know the data that well.)

Apr 202011

As a business analyst, I live and die by logging. This makes me vigilant about what products are being developed by my organization, and how they change from concept to wireframes to implementation. Rarely do these three stages look the same, and sometimes the end product is a far cry from the original beast due to time pressures, build vs. buy decisions, scope creep, and a number of other fun issues. Regardless of my vigilance, I find that logging, and thoughts around instrumentation almost always come last. I am not alone in my observations as other analyst friends have made the same comment. In fact, this was verified by a development lead at a large organization recently when he commented to me “you know, we always wait until it’s too late to add logging, if we even consider it in the first place.”

Why is it that engineers have such an aversion to extended, non-performance instrumentation, and find it so onerous or unimportant? They write unit tests. They instrument for speed of throughput, heartbeat, and error messaging but tend to ignore the basics of user behavior on the products they have built.  It is seen as extraneous, performance impacting, nonsensical even. This is unfortunate.

When I was in graduate school my dissertation focused on how individual’s beliefs about the degree to which their organization in general, and their supervisor specifically, impacted their work behaviors.  In other words, if you think your supervisor cares about you as a person, does that make you work harder? What about your overall organization – does that matter? Are there special traits of supervisors that make you more or less likely to do your job well, to help others, to protect the organization from lawsuits or other problems, to decide to stay instead of quitting?  It took me almost 2 years to collect enough data to answer this set of questions. Two years. Today, I can ask interesting, in-depth questions about the data I collect every 2 minutes. The only reason this is possible is because the damn products are instrumented like mad to tell me everything the user is doing, seeing, interacting with (and choosing to ignore). This information is powerful for understanding usability, discovery, annoying product issues like confusing pages or buttons. Predictive analytic models can be built off of this behavior (user X likes this stuff, hates that stuff, buys this stuff, ignores that stuff etc…) but only if it is logged. With both a strong BI opportunity and predictive analytics opportunities, why is logging so often ignored, perfunctory, or offloaded to companies like Google – almost as an afterthought?

My theory is that because the nuances of logging often make it fragile and complex, it isn’t easy to determine if it is accurate when in development. As the underlying systems change – whether that be schema shuffling or enumerated value redefinition (or recycling) for example and many hands are touching the code that creates the product, it makes sense to wait until things settle down to begin adding the measurement devices. Unfortunately, there are often special cases introduced – invisible to an end user, but obvious under the hood that makes straightforward logging difficult. The end result is often a pared down version of logging that is seen as “good enough” but not ideal. The classic “we’ll do this right in vNext” is my most hated phrase to hear.

The workaround to this malady, when possible, is to introduce clear, concise, standardized logging requirements that engineers can leverage across products. Often a block of specific types of values (timestamp, screen size, operating system, IP, user-id, etc) describe a majority of the values the analyst needs for pivoting, monitoring, etc. the remaining portion of a schema can then contain the pieces that are unique to the specific product (like “query string” if searching is a possible action in one product but not others).

The analyst must be vigilant, aware, engaged, and on the lookout for implementations that introduce actions or behaviors that are currently unlogged or that break expectations so that he or she can engage engineers proactively, before it’s too late, to add functionality to logging and be sure that important and essential user behavioral data does not go down the tube of the dreaded “vNext”.

Apr 032011

I work in the mobile industry, and as a business analyst, I read a number of articles each week on mobile trends and predictions. These articles may discuss handset popularity, distribution of phones by country, data consumption, web usage, application purchases, or any other of a million broad or niche topics. One thing that I have not seen a lot of is independent collections of information. Often the stats I see are either teasers by analysts trying to get you to buy their reports, or marketing documents written to sell someone (not me) some product to help their mobile business. In doing a little research for a whitepaper I’m writing, I ran into this awesome collection of data by the guys at mobithinking on mobile trends that was released in February of 2011 (< 2 months ago)!

Not only is it a super long list describing several types of trends around consumption, change, applications, platforms, etc. They include many links to the original sources of data and even multiple sources for the same factoids (sometimes the sources agree and sometimes they do not). Lastly, mobithinking folks make a number of comments about the data they report, adding their expert opinions around the facts and figures they present.

I found this webpage to be an awesome collection of useful information for understanding where the mobile market is heading, what the predictions for the future are (revenues, OS wise, etc.), and I will be using this as a go-to when I need future factoids to round out certain business arguments or to set the stage for presentations.

The most interesting bit I saw (and I did not read everything….yet) was on the estimation of when US mobile carriers, if they don’t change course, will cease being profitable (original article available as the “executive summary” which at 8 pages long seems about 7 too many for most executives, here). Spoiler alert: Unprofitability is coming more quickly than you probably think it is.