Oct 072013
 

When taking on an analytics project, or designing a reporting system (dashboard or otherwise) a core component to superior execution is to properly understand the question(s) the vehicle is expected to answer.  This may seem like an obvious statement, but it is amazing how often the metrics of focus are done so for convenience rather than impact. Additionally, dashboards and reports are often (at least initially) put together by individuals with little training in design and business reporting. A monkey can make a graph, but it takes a bit of thought and planning to make something impactful.  I would argue that the state of business intelligence in general suffers from this issue – people undervalue the opportunities for using data to make great business decisions because they have learned that the data available to them is not useful for doing so. Instead of insisting that the metrics and reports be useful for business decisions, they instead write off the full potential of the data and go back to the inefficiencies of traditional gut based decision making. What they fail to realize or are not made aware of, is the wide variety of data available that is not being used. Empowering decision makers to utilize data is a core purpose of an analyst.

While assigning fault for the poor state of BI affairs isn’t particularly helpful, it’s worth noting that it is a systemic issue based on the past delivery of inferior metrics and reports coupled with limited decision making timeframes. This can also be compounded by the general ignorance of what data is available. The analyst’s job is to right these wrongs and must retrain the organization around how data is approached, assembled, and utilized for business decisions. This reorganization begins with the most basic step in the analytics chain: Defining the right questions.

The reason that the question definition is key is because all further analytical planning actions stem from it. Question definition, while the job of the analyst, requires input from the consumers of the information. Interviewing those who will use the analysts’ output is key to deciding what any given analytical product will contain including how many metrics are needed, in what format, and how frequently they will need to be updated.  The unsophisticated organization derives metrics by default, in a post-hoc manner based around whatever data has happened to be collected. This approach is not likely to have the impact that a more carefully planned set of information tailored to the business needs will.

Additionally, some decision makers will believe they know exactly what data they want and need, and it is important that the analyst probe and make sure this is correct. Finding out what a particular metric will be used for, or why a specific piece of information is useful can uncover unintended interpretation mistakes (e.g. when a unique-id is not indicative of a unique user due to cookie expiration).  It is safe to say that while business owners often do not understand the data used to create particular metrics, they often have strong opinions about what the metrics mean. It is the job of the analyst to educate and redefine based around these shortcomings. Furthermore, the analyst should be aware of the myriad of data sources that are available for creating metrics from, helping to aid the business owner through discovery. This is a major reason it is critical to get the BI consumer/business decision maker to talk about what the questions they are trying to answer are rather than to expect them to rattle off a full list of needed metrics. The analyst defines the metrics based on the needs of the business owner. It is crucial that the analyst take an active, participatory role in the design stage rather than a passive “fast food order” style of design. You are a data educator – act like one.

In closing, there are a number of additional steps to defining the mechanism necessary for answering the questions settled upon. Some questions require multiple metrics, new data collection, derivative metrics (e.g. time between purchases is derived from last purchase timestamp minus previous purchase timestamp), or a re-statement of the question due to limitations in available data. This layer of the design lies in between the initial question definition and the visual display of data, but again is key to successful execution of an analytic output. The analyst is an artist, a teacher, a designer, a data counselor, and a storyteller as well as being the one who knows the data. You can’t design an output mechanism if you don’t know what the input question is.

Question –> translates to data –> summarized by metrics –> interpreted by analysis –> converted to display –> answers or informs question

May 132013
 

There are a number of often unspoken jobs that the analyst must perform in order to be useful, helpful and thereby successful.  These include being a question definer, a translator, a data expert, a storyteller, and an anticipator of post analysis questions. Without approaching a business question (which should inform a specific business action) carefully, the opportunity for error or underwhelming findings is greatly increased.

Definer of THE QUESTION – When I think about analysis I always start with “what is the question we are trying to answer?” The question to be answered is never as simple as whatever my boss has asked me; data is messy, the question is complex, and more often than not, the initial question is wrong. By wrong, I mean that it is too general, makes assumptions about the answer that may be wrong, or just does not make sense from a business point of view. One way to think about whether the question is a good one or not is to come up with a fake answer and see if it would change anything. “How many women in their 40’s are tweeting about our product?” the boss asks you… the answer, by itself, is probably pretty useless – “135”. What the boss really wants to know is “what are the demographics of people tweeting about our products, I think it’s mostly demographic X?” Assuming you have access to the demographics, you can present a plethora of data – by product, by cohort, etc. That approach makes your answer to the initial (not good) question make more sense “Looks like 135, which is 32% of all folks tweeting about us at all.” Engaging in a Socratic dialogue with the data consumer (or yourself for that matter) can help you to understand the kernel, the impetus for the question and thereby redefine it in a way that extends the usefulness of the answer and guides additional analyses necessary for understanding more deeply the phenomenon you are investigating.

Rosetta Stone – The analyst must be able to translate fluidly between many entities. Business and marketing oriented groups will not speak the same language as data miners and data scientists. The audience for consuming data is constantly changing. The analyst must anticipate these differences, speak the different languages and be sensitive to the fact that her job is to increase understanding (rather than expecting their customers, the data consumers, to do that research after the data is presented). Again, if the analyst was not needed to act as a translator, she would be easily replaced by a tool at some point. Once the question to be answered has been defined, the analyst will most likely have to interact with entities to extract the raw data. One man’s “how many” is another man’s “SELECT * FROM table_name”.

Master of Data – At the end of the day, the analyst must know the data they are transforming and interpreting. They must understand the structures, the nuances, and the quality. This means a part of an analyst’s job is to investigate, on their own, the data sources they interact with. Blindly extracting data is a great way to create false conclusions due to poor quality data, null fields, strange conventions and more. I recall running an analysis whereby I found the “gender” column of a master table contained the values “male”, “female”, and “child”. Imagine if I had tried to do some sort of deductive analysis whereby I got a count of total uniques and total males, and then took a shortcut (uniques – males) to derive females. Oops. There is no universal taxonomy, there is no universal ontology. When it comes to data, you have to check and double check the sources to be sure you understand the nature of the data you are extracting value from.

Storyteller – The analyst needs to take data, in its raw form, and mold it into something that can be understood and easily retained by the audience. Answers to business questions are rarely straightforward (and if they were, the analyst would be replaced by a dashboard) and often require a contextual back story. The analyst must determine, to the best of his or her ability, the relevant context for the answers presented. This context often goes beyond the data and requires (gasp) talking to others or (shudder) using the product that generated the data. Without the appropriate data context and the ability to describe that context to the audience, the analyst is nothing more than an overpaid calculator moving numbers into tables and charts for someone else to interpret. A great interview on the storytelling aspect of data analysis was given by Cole Nussbaumer (whose blog I am adding to the blogroll on the front page) for the website klevr.org.

Anticipator of Questions – I believe that a successful presentation of data provides clear information about a specific set of business questions such that decisions can be made. I also believe that a successful analysis generates more questions. While that may seem counter intuitive  (if you answered the original questions why are people asking more?) in my experience, the questions asked after a successful analysis are those that build upon the insights you presented (rather than being unrelated or confrontational/doubtful). If you get no questions, I fear you have bored or confused your audience. That said, anticipating the most likely follow up questions and running the analyses on a few is generally low cost with high reward as it shows you have defined the question space well, translated it in the appropriate manner, retrieved the data to answer those questions above and beyond expectations, have mastered the data and are now weaving it into the epilogue of your story, delighting your audience and giving them valuable information. If nobody asks the questions you anticipated, you can always present them (quickly, as you have spent most of your time talking about the core question) as bonus material “I was curious about…” this also leaves your audience with the assurance that you actually care about unearthing insights and frankly, if you really don’t, you should find a new job.

May 062013
 

In business, as in life, there are a number of unknowns that we constantly have to notice, interpret, and react to. A superior strategy includes prediction or some sort of range of expectation(s) for key actions (e.g. product releases, announcements), the accuracy of which are the result of factors like good intuition, observation, and pattern recognition. As such, the strategic minded individual should be collecting information from multiple sources in order to make the transition from expectation to outcome as seamless as possible as often as possible. There will be colossal misses, due to misinterpretation of given data, failure to collect appropriate data, or a high degree of chance in the actual outcome. Example: The cult film Donny Darko was released to major theaters around the time of the September 11 attacks. Part of its failure as a major release has been attributed, post-hoc, to the particularly sensitive aspect of one of the movie’s main events – an airplane engine falling from the sky and destroying Donny’s house. Estimating the success of this movie when it was being prepared for its release would never have included a “terrorist airplane attack” factor. That said, reasonable ranges of expectations can be provided most of the time.

A huge advantage to narrowing the range of possibilities of a particular forecast or outcome while also maintaining a decent accuracy rate is to engage in lateral thinking and converging on answers through the use of techniques like proxy variables. One place that I have seen analysts and non-analysts falter when trying to predict a business outcome is their failure to engage in creative thinking around ways to estimate unknowns.  Rarely is an estimating technique as simple as plugging in a few values to a known formula – especially when tacking innovative solutions. Frankly, if it was this easy, then analysts and strategic thinkers would have a very short shelf-life as they would come in, set up the magic eight ball, and be done forever.   An analyst’s job is to explore, research, and create (art + science) answers to the right questions. Note that I didn’t say all questions. An additional part of the analyst’s job is to act as a noise filter by taking and refining the key pieces of business requests and squeezing them down to basic elements of what needs to be known.

In the following posts I plan to tackle a number of specific topics revolving around approaches to making good predictions and providing superior answers, not from a cookbook style point of view but from a higher level. Let’s get meta and call it a strategy around creating strategy. The approach I take is that of the analyst as a curator, a gardener, and a scientist. The analyst is proactive, inquisitive, provides unique insight and through knowledge of the data, surfaces questions that have never been asked.

Every business is different, but often the core remains the same: a good or service is being offered to a consumer. Working from that core, business questions around directions to take an offering, how it will fare in the open market, how to improve it, minimum viable product requirements and other more mundane day to day curiosities will arise. The analyst should be able to tackle these as a matter of course, and recognize the larger questions implied by the smaller ones and vice versa. Rarely is there a single question, and rarely is there a single answer to any given question. In the following posts I plan to explore the world of the analyst from my own personal lens, providing an overarching description and then digging into specific topics. The following is my off-the-cuff laundry list of expected posts.  Hope you enjoy them.

  • The role of the strategist and analyst
  • Answers 101: Defining the right questions
  • Using proxy variables to improve estimates and answers
  • Information sources and intelligent approaches to information
  • Core data needs: Quality, Breadth, and Volume
  • Skunkwork Analytics: your often undefined job
Jul 062011
 

I have an intern right now, which allows me the opportunity for a lot of teaching opportunities which I love. In order to give her a real piece of work to chew on and master, I handed her the analysis of a new product my company has launched into the Android market.  It’s a mobile coupon aggregation service called Lotza, and the idea behind it is that with all the daily deal sights that are popping up everywhere like Groupon and Tippr, it would be nice if there was something that a) showed all the deals in one place and b) only showed the most likely deals that would be of interest and c) stored all purchased offers in a single wallet, for simple (virtual) retrieval. There’s a lot more to it in the development pipeline, but at beta, that is the core functionality. Using our analytics backend to power a direct-to-consumer product is a great opportunity for us to own the data we generate, and to experiment freely with layout, design, and analytic algorithms. Being the complete masters of our own system comes with all the accompanying responsibilities you might expect.

As per the norm, my Business Intelligence team drove product instrumentation and logging requirements to capture the user experience to the fullest. As is often true in software development, after all was said and done everything was not perfect (but was close) making the derivation of metrics less straightforward than simple aggregations across fields. Multiple tables with specialized information about certain aspects of a user profile, in session experiences, and behaviors make for a great multidimensional ball of potential confusion.

My intern had begun working with me at a time after these requirements had been written, after the schemas were defined, and after most of the logging was already implemented. To bring her up to speed I had her create a logging guide document showing every page in the product and all user actions possible per page along with the associated logging. In running actual tests on a phone, and tailing the logs live, she found several small bugs, proving to me that she was paying attention and was understanding the structure of the data (even though it was complex in parts). Once this document existed, a set of reports was defined for a presentation she is expected to perform to the internal stakeholders next week. This presentation will be a general overview of the product, and a discussion of usage so far including important KPIs such as sessions per user, the conversion funnel, and page abandonment rates. None of these metrics are as straightforward as they could be in a perfect product world, which is one point of this post. The important stuff is often under the initial layer of data, and requires special filters along with an intimate understanding of both what the logging requirements and definitions were, and how they have actually been implemented.

The second point is that the analyst has to know these nuanced methods for counting what would otherwise be a simple sum or count across unique values. The analyst who isn’t able to roll up her sleeves and dive into the (potential) logging morass is really trusting the insights derived to the gods of perfect logging. My intern has recognized anomalies, and worked around them. When she gives her talk next week, she will be armed with the ability to answer almost any question thrown at her – in other words, the “why” to her “what”. Even though this data wading has eaten up a lot of her time (she worked over July 4th and the 5th, a company holiday) she thanked me for the opportunity. She is confident in her findings, has learned a lot, and knows more about the logging system for our product than I currently do (and I wrote the logging spec and defined the original schema!). Her approach of looking at logs, reading documents (that are sometimes out of date), and tailing actual logs to identify examples and verify accurate data capture (and her own understanding of what is happening) represent the many hats I believe an analyst should wear. With so much data being generated every day, and the complexity of that data increasing, the analyst must be a data generalist. I’m not proposing that all analysts must be masters of SQL and statistics and technical writing and math and the art of visual presentation – I’m suggesting that the best analysts will utilize whatever tools they need (including engineers) to get the right data in the right format to the right people. It is my opinion that anyone so lopsided in their training as to only know one of the above mentioned skills is likely to underutilize what they find – the data will be inaccurate, confusing, mysterious, overwhelming. In short, it will be a disservice to their organization, and due to their focus, they might not even realize the problem. Come on analysts, diversify!

In closing: Data is messy. Roll up your sleeves. Question your results. Triangulate to verify. Cross-reference values. Perform sniff tests. As an analyst you should be the most engaged and knowledgeable person with the data you own – your consumers rely on you to play the role of a translator and to represent your confidence in the accuracy of your findings.  By projecting this confidence and integrity, those times when you find absolutely horrendous data, or unbelievable (in a bad way) information you can either find ways to salvage some useful information from it and/or proclaim with certainty that the data is, for the most part, so dirty that your analyses are unreliable and therefore not worth the effort. Your consumers may not always like the answers you provide, but they will respect you for declaring as much, especially when you know the data so well you can pinpoint the major issues that bring reliability into doubt. Of course, when you find amazing insights you can confidently present them (and show backup verification that you didn’t make some simple error – you know the data that well.)