Dec 022013
 

While the information gathered around the functioning and user interaction with a product or service can be priceless to multiple organizational parties for making key business decisions, data is often approached from the point of view of a secondary by-product (exhaust), or as secondary to basic product functionality. In my experience this occurs for a few reasons:

  • Data collection is not necessary for a product or service to function
  • Data collection has no positive effect on real time product performance
  • Data collection often reflects a complex mixture of needs from multiple internal parties, muddying ownership and responsibility

 

As such, due to market pressure and a lack of data advocacy, products and services can launch and get initial traction with no data collection mechanisms whatsoever. This creates a false sense of hierarchy regarding the place of data oriented development within the product. In an agile framework especially, this can be a severe impediment to getting a robust data collection system in place due to constant ranking below-the-line of assigned man-hours unless data collection (and its dependencies) is perceived as having equal footing alongside high ranking product features.

 

At heart, I am a philosopher, and I encourage business owners to humor me with their own philosophies (or visions) around the product they are creating or the organization they are in charge of. This exploration allows me to understand their overarching positions and expectations and gives me opportunities to design data strategies (i.e. collection, analysis) towards meeting those stated expectations and, most importantly, the unstated future expectations. It is this approach that lets me meaningfully rank order system requirements, anticipate pivots, and produce analytic output in anticipation of its necessity.

 

System level thinking around a company vision drives the expectations and needs of business intelligence and also gives solid reasoning around what is needed in each phase of a product lifecycle. A product is derived from the expectations of how to solve the identified problem in the market.  The role of data lies in providing information regarding the extent to which the problem exists, whether it is addressed and solved or lessened by the product functions, and whether the nature of the problem changes over time. By defining data needs from the vision level, systems can be designed (green field, retro-fit, refactor, or otherwise) with the flexibility to grow comfortably into the realization of the vision as the product matures. A focus on providing narrow intelligence for an initial product, or relying too heavily on a reporting template is a sure path to mediocrity and sub-par output.

 

The vision, when properly evaluated, gives key insight into the minimally viable BI product, the importance of particular kinds of data over others, the importance of data quality for each data source, the extent to which a product must be instrumented, key milestones in the creation of a mature data system, and a number of other basic needs for a successful system of analytics. In other words, the enunciation of the vision allows for an appropriate data strategy to be created and implemented alongside product features. This allows multiple data consumers the opportunity to engage with data throughout the product lifecycle, promotes data-centric thinking amongst owners, and helps engineers and other builders to design systems that include robust data pipelines. It changes the conversation between a data team and engineering from “do you want me to collect that?” to “how do you want me to collect that?”

Oct 072013
 

When taking on an analytics project, or designing a reporting system (dashboard or otherwise) a core component to superior execution is to properly understand the question(s) the vehicle is expected to answer.  This may seem like an obvious statement, but it is amazing how often the metrics of focus are done so for convenience rather than impact. Additionally, dashboards and reports are often (at least initially) put together by individuals with little training in design and business reporting. A monkey can make a graph, but it takes a bit of thought and planning to make something impactful.  I would argue that the state of business intelligence in general suffers from this issue – people undervalue the opportunities for using data to make great business decisions because they have learned that the data available to them is not useful for doing so. Instead of insisting that the metrics and reports be useful for business decisions, they instead write off the full potential of the data and go back to the inefficiencies of traditional gut based decision making. What they fail to realize or are not made aware of, is the wide variety of data available that is not being used. Empowering decision makers to utilize data is a core purpose of an analyst.

While assigning fault for the poor state of BI affairs isn’t particularly helpful, it’s worth noting that it is a systemic issue based on the past delivery of inferior metrics and reports coupled with limited decision making timeframes. This can also be compounded by the general ignorance of what data is available. The analyst’s job is to right these wrongs and must retrain the organization around how data is approached, assembled, and utilized for business decisions. This reorganization begins with the most basic step in the analytics chain: Defining the right questions.

The reason that the question definition is key is because all further analytical planning actions stem from it. Question definition, while the job of the analyst, requires input from the consumers of the information. Interviewing those who will use the analysts’ output is key to deciding what any given analytical product will contain including how many metrics are needed, in what format, and how frequently they will need to be updated.  The unsophisticated organization derives metrics by default, in a post-hoc manner based around whatever data has happened to be collected. This approach is not likely to have the impact that a more carefully planned set of information tailored to the business needs will.

Additionally, some decision makers will believe they know exactly what data they want and need, and it is important that the analyst probe and make sure this is correct. Finding out what a particular metric will be used for, or why a specific piece of information is useful can uncover unintended interpretation mistakes (e.g. when a unique-id is not indicative of a unique user due to cookie expiration).  It is safe to say that while business owners often do not understand the data used to create particular metrics, they often have strong opinions about what the metrics mean. It is the job of the analyst to educate and redefine based around these shortcomings. Furthermore, the analyst should be aware of the myriad of data sources that are available for creating metrics from, helping to aid the business owner through discovery. This is a major reason it is critical to get the BI consumer/business decision maker to talk about what the questions they are trying to answer are rather than to expect them to rattle off a full list of needed metrics. The analyst defines the metrics based on the needs of the business owner. It is crucial that the analyst take an active, participatory role in the design stage rather than a passive “fast food order” style of design. You are a data educator – act like one.

In closing, there are a number of additional steps to defining the mechanism necessary for answering the questions settled upon. Some questions require multiple metrics, new data collection, derivative metrics (e.g. time between purchases is derived from last purchase timestamp minus previous purchase timestamp), or a re-statement of the question due to limitations in available data. This layer of the design lies in between the initial question definition and the visual display of data, but again is key to successful execution of an analytic output. The analyst is an artist, a teacher, a designer, a data counselor, and a storyteller as well as being the one who knows the data. You can’t design an output mechanism if you don’t know what the input question is.

Question –> translates to data –> summarized by metrics –> interpreted by analysis –> converted to display –> answers or informs question

May 132013
 

There are a number of often unspoken jobs that the analyst must perform in order to be useful, helpful and thereby successful.  These include being a question definer, a translator, a data expert, a storyteller, and an anticipator of post analysis questions. Without approaching a business question (which should inform a specific business action) carefully, the opportunity for error or underwhelming findings is greatly increased.

Definer of THE QUESTION – When I think about analysis I always start with “what is the question we are trying to answer?” The question to be answered is never as simple as whatever my boss has asked me; data is messy, the question is complex, and more often than not, the initial question is wrong. By wrong, I mean that it is too general, makes assumptions about the answer that may be wrong, or just does not make sense from a business point of view. One way to think about whether the question is a good one or not is to come up with a fake answer and see if it would change anything. “How many women in their 40’s are tweeting about our product?” the boss asks you… the answer, by itself, is probably pretty useless – “135”. What the boss really wants to know is “what are the demographics of people tweeting about our products, I think it’s mostly demographic X?” Assuming you have access to the demographics, you can present a plethora of data – by product, by cohort, etc. That approach makes your answer to the initial (not good) question make more sense “Looks like 135, which is 32% of all folks tweeting about us at all.” Engaging in a Socratic dialogue with the data consumer (or yourself for that matter) can help you to understand the kernel, the impetus for the question and thereby redefine it in a way that extends the usefulness of the answer and guides additional analyses necessary for understanding more deeply the phenomenon you are investigating.

Rosetta Stone – The analyst must be able to translate fluidly between many entities. Business and marketing oriented groups will not speak the same language as data miners and data scientists. The audience for consuming data is constantly changing. The analyst must anticipate these differences, speak the different languages and be sensitive to the fact that her job is to increase understanding (rather than expecting their customers, the data consumers, to do that research after the data is presented). Again, if the analyst was not needed to act as a translator, she would be easily replaced by a tool at some point. Once the question to be answered has been defined, the analyst will most likely have to interact with entities to extract the raw data. One man’s “how many” is another man’s “SELECT * FROM table_name”.

Master of Data – At the end of the day, the analyst must know the data they are transforming and interpreting. They must understand the structures, the nuances, and the quality. This means a part of an analyst’s job is to investigate, on their own, the data sources they interact with. Blindly extracting data is a great way to create false conclusions due to poor quality data, null fields, strange conventions and more. I recall running an analysis whereby I found the “gender” column of a master table contained the values “male”, “female”, and “child”. Imagine if I had tried to do some sort of deductive analysis whereby I got a count of total uniques and total males, and then took a shortcut (uniques – males) to derive females. Oops. There is no universal taxonomy, there is no universal ontology. When it comes to data, you have to check and double check the sources to be sure you understand the nature of the data you are extracting value from.

Storyteller – The analyst needs to take data, in its raw form, and mold it into something that can be understood and easily retained by the audience. Answers to business questions are rarely straightforward (and if they were, the analyst would be replaced by a dashboard) and often require a contextual back story. The analyst must determine, to the best of his or her ability, the relevant context for the answers presented. This context often goes beyond the data and requires (gasp) talking to others or (shudder) using the product that generated the data. Without the appropriate data context and the ability to describe that context to the audience, the analyst is nothing more than an overpaid calculator moving numbers into tables and charts for someone else to interpret. A great interview on the storytelling aspect of data analysis was given by Cole Nussbaumer (whose blog I am adding to the blogroll on the front page) for the website klevr.org.

Anticipator of Questions – I believe that a successful presentation of data provides clear information about a specific set of business questions such that decisions can be made. I also believe that a successful analysis generates more questions. While that may seem counter intuitive  (if you answered the original questions why are people asking more?) in my experience, the questions asked after a successful analysis are those that build upon the insights you presented (rather than being unrelated or confrontational/doubtful). If you get no questions, I fear you have bored or confused your audience. That said, anticipating the most likely follow up questions and running the analyses on a few is generally low cost with high reward as it shows you have defined the question space well, translated it in the appropriate manner, retrieved the data to answer those questions above and beyond expectations, have mastered the data and are now weaving it into the epilogue of your story, delighting your audience and giving them valuable information. If nobody asks the questions you anticipated, you can always present them (quickly, as you have spent most of your time talking about the core question) as bonus material “I was curious about…” this also leaves your audience with the assurance that you actually care about unearthing insights and frankly, if you really don’t, you should find a new job.

May 062013
 

In business, as in life, there are a number of unknowns that we constantly have to notice, interpret, and react to. A superior strategy includes prediction or some sort of range of expectation(s) for key actions (e.g. product releases, announcements), the accuracy of which are the result of factors like good intuition, observation, and pattern recognition. As such, the strategic minded individual should be collecting information from multiple sources in order to make the transition from expectation to outcome as seamless as possible as often as possible. There will be colossal misses, due to misinterpretation of given data, failure to collect appropriate data, or a high degree of chance in the actual outcome. Example: The cult film Donny Darko was released to major theaters around the time of the September 11 attacks. Part of its failure as a major release has been attributed, post-hoc, to the particularly sensitive aspect of one of the movie’s main events – an airplane engine falling from the sky and destroying Donny’s house. Estimating the success of this movie when it was being prepared for its release would never have included a “terrorist airplane attack” factor. That said, reasonable ranges of expectations can be provided most of the time.

A huge advantage to narrowing the range of possibilities of a particular forecast or outcome while also maintaining a decent accuracy rate is to engage in lateral thinking and converging on answers through the use of techniques like proxy variables. One place that I have seen analysts and non-analysts falter when trying to predict a business outcome is their failure to engage in creative thinking around ways to estimate unknowns.  Rarely is an estimating technique as simple as plugging in a few values to a known formula – especially when tacking innovative solutions. Frankly, if it was this easy, then analysts and strategic thinkers would have a very short shelf-life as they would come in, set up the magic eight ball, and be done forever.   An analyst’s job is to explore, research, and create (art + science) answers to the right questions. Note that I didn’t say all questions. An additional part of the analyst’s job is to act as a noise filter by taking and refining the key pieces of business requests and squeezing them down to basic elements of what needs to be known.

In the following posts I plan to tackle a number of specific topics revolving around approaches to making good predictions and providing superior answers, not from a cookbook style point of view but from a higher level. Let’s get meta and call it a strategy around creating strategy. The approach I take is that of the analyst as a curator, a gardener, and a scientist. The analyst is proactive, inquisitive, provides unique insight and through knowledge of the data, surfaces questions that have never been asked.

Every business is different, but often the core remains the same: a good or service is being offered to a consumer. Working from that core, business questions around directions to take an offering, how it will fare in the open market, how to improve it, minimum viable product requirements and other more mundane day to day curiosities will arise. The analyst should be able to tackle these as a matter of course, and recognize the larger questions implied by the smaller ones and vice versa. Rarely is there a single question, and rarely is there a single answer to any given question. In the following posts I plan to explore the world of the analyst from my own personal lens, providing an overarching description and then digging into specific topics. The following is my off-the-cuff laundry list of expected posts.  Hope you enjoy them.

  • The role of the strategist and analyst
  • Answers 101: Defining the right questions
  • Using proxy variables to improve estimates and answers
  • Information sources and intelligent approaches to information
  • Core data needs: Quality, Breadth, and Volume
  • Skunkwork Analytics: your often undefined job
Jan 022013
 

The data buzz-phrase of the current century, “Big Data”, is often approached as a magical construct that one might lash themselves to and, like Odin to the Yggdrasil, walk away with great knowledge after a time – maybe just by being near it. The idea being that using this toolset is THE way for extracting value from your data. I’m not the first to say it, but this is similar to how relational data bases have been sold for years, only now the promise extends out to unstructured and semi-structured data. Pro tip – you still have to manipulate the data to get anything worthwhile from it, and that assumes you collected the right stuff to begin with.

 

It’s unfortunate that a lot of people in the organizational position to make investments in data infrastructures, technologies, and tools get stuck playing a game of mad libs instead of figuring out what each tool can do and more importantly what they need each tool to do to be useful.  By that I mean that they have a sentence that goes something like “If only I had _____ technology all my _____ problems would be solved”. On the flip side, companies trying to sell Big Data services love these kinds of decision makers, promising them that “cloud based, big data solutions” solve all data problems. I mean, take any kind of data (structured, unstructured, semi-structured) upload it to the cloud, throw it into HBASE, run a map/reduce job against it in Hadoop and BAM! Cool… then what? Cloud storage is infinitely sized, safe, and depending on how much you rent it for, geo-redundant. Problems solved, right? Or are they?

 

Let’s back up and start…at the beginning. If you have a business that can potentially generate a lot of data (transactional, operational, etc.) you fall into one of two camps: you currently have a ton of data you are warehousing/archiving or you do not have a ton of data (for one or more of several reasons) but now you can once you instrument your systems to spit out proper logs.

 

Let’s assume you are in the first camp and have a ton of data. What kind of data have you gathered, and in what format? How much data do you generate every day? Lastly – could you vastly shrink the amount of useful data you gather by applying simple ETL jobs? I’d argue that most organizations (not all) that are looking into big data solutions are actually doing so very prematurely. Just because you can suddenly collect and infinitely store every piece of data your servers generate, the output from your web logs, and all public mentions of your organization on Twitter and Facebook is probably more a curse than a blessing – the concept of infinite storage for cheap promotes an unthinking “dump it in here and we will sort it out later” approach to data collection. It’s true, storage is cheap, but paying developers to pick through the garbage later (often over and over again) is mind-numbingly expensive. A better solution is to structure your data collection intelligently, write ETL jobs that make your data compact and accessible and let your developers spend their time using the data to improve your business instead of (potentially over and over again).

 

Now switch to the second camp – no data now, but lots ASAP.  What kinds of data can you and should you gather? How should it be structured? What will you do with it? The nature of these questions suggest that trying to choose the tools you will use without an initial grounding in what you can have and what will use the tools for makes the choices premature at best. But the experts you talk to may suggest you just start collecting as much as fast as you can, since storage is cheap and…

 

This “I have a hammer and everything looks like a nail” approach to capturing and deriving value from data (or data exhaust, as it is sometimes called) by using a particular tool alone is really shortsighted, and a recipe for expensive failure as you hire expensive experts to troll through your piles of garbage looking for gold rather than setting up your organization for successful insights ahead of time. Use the current fixation on big data to promote your data strategy, to get developers instrumenting your products and services deeply, in the hopes that you will soon have a high quality data asset that screams out for some tool to tame it. This may be a big data tool like Hadoop, or it may be a set of perl scripts, or (gasp) an Excel spreadsheet. The point is that Hadoop and the rest are tools to be pointed and fired at specific issues in specific situations. You are not Google, and you probably don’t need the tools Google uses. You do need to be smart about data, which is something the big data buzz has highlighted. The beauty of the current landscape is that if you actually need massive scale processing that fits in the map/reduce paradigm, you can have it. In other words, you are no longer limited (or forced to sample) when you have a large set of data. All the other issues with data quality that have plagued us forever are still present, important (maybe moreso), and in need of attention. Don’t be lulled into a false sense of security just because you have a larger bucket for use in panning – you still have to sift through it all to find the gold, IF you captured enough of the right types of data to begin with.

Nov 252012
 

So Nate Silver is the stats nerd of the year for his great (or lucky, if you hate science) methodology around poll aggregation and the poll weighting algorithm he employed regarding the prediction of the outcome of the recent national elections. Congratulations Nate, if I didn’t live in a country with Byzantine banking laws, I would have made a tidy sum using your leg work (among others – I firmly believe in leveraging the wisdom of crowds of experts) to invest on “Obama to win” via the event based market InTrade. I haven’t been able to find any apologies by the demonizers who suggested Nate was just another political hack (like them?) who was rooting for the wrong team and trying to hide it behind some sort of magical thinking in the guise of science, but I can’t say I looked too hard.

While the disappointing part of the whole Nate Silver predicting the elections bit lies in the constant misinterpretation of what Nate actually did to come by his numbers due to the general publics’ pseudounderstanding of statistics, the beauty of the press he received both before and after the election has elevated the role of data in decision making – even messy social data like poll results (essentially surveys, with all their inherent issues). The age old “gut feeling” as the sole driver of decision making (i.e. guessing) is coming under needed scrutiny in an age where having current and historical information is finally possible. Those who fail to incorporate data, especially data that is readily available or easily gathered, will be left behind or when successful in their guesses (expertise does have its place) will be less efficient.

It is my firm opinion that gut feeling is a garnish best placed on top of data driven analysis where the depth of gut needed is (roughly) inversely proportionate to the data available. Nate doesn’t use gut feelings, he uses data, which can then then be handed to those responsible for making decisions.

So how does Nate Silver make my job easier? As Silver commented to Jon Stewart on the Daily Show after being asked about what it would mean if his model had been wrong, Nate responded “It would have been bad, I think, because for some reason 538 became invested with this symbolic power, and you know symbolic power isn’t particularly rational, right, but it became this symbol for people who were believing in hey, let’s look at the polls. Let’s do some empirical research, right.” Empirical research was shown to best guts. This research was contrary to a huge contingent of, not surprisingly, biased observers but was shown to be superior to all other estimations, guesses, scrying stone proclamations, etc. even those made by individuals with a vested interest in Obama winning. His. Model. Won. Data won.  As data of such a highly scrutinized, over-thought, expensive contest won over individual “expert” opinion, my job got easier. The hugely symbolic power of that specific use of data helps serve as a powerful example of what data can do. When talking to organizations about the value of data, the value of quality data, and the usefulness of measurement to drive business decisions I now have an example that everyone knows, and in some small way, understands. Am I comparing myself to Nate Silver? Not particularly – we come from very different backgrounds, education, approaches etc. But one thing is certain – he has just made the human interaction part of my job a lot easier – that part where I am convincing a client to invest in data resources, to care about data quality, data completeness, and data driven decision making. Thanks Nate.

Jul 302012
 

A friend sent me a great blog post (see #1 in the list at the end of this post) around testing that has been buzzing around (and should be read and debated if you care about such things even a little bit). The post introduces (as in “brings to our attention” not as in “invents”) a method of easily coding an epsilon greedy strategy for testing web (or whatever) optimization and claims that it is superior to the well-established standby of  A/B testing (oooh, thems fightin words!) This post has inspired a number of responses by folks who run A/B tests, folks who optimize and test websites, and computer nerds interested in arguing about this type of stuff in general.

The normal array of folks weigh in – the engineers who almost understand* the experimental/statistical approach to A/B testing, statistician purists who sing about the superiority of A/B testing, and the practical coders who just ask for something that is simple, intuitive, and works towards the end goal of making money on their site. It’s the interplay between these folks that I found interesting (and entertaining). Full disclosure – I’m not a master statistician but have plenty of experience in experimental design, A/B testing, and general statistics. I am not by any stretch a website optimizer or an engineer.

At the end of the day, for the majority of those who are interested in testing – and I’m not talking about Google or Bing but rather the rest of the world, they want something that works very well and that converges on an optimal or close to optimal solution quickly. Maybe we should even get fuzzy with the term “optimal” by saying it means exacting maximum price/conversion/experience AND painlessness/stability/implementation ease. The main arguments against the A/B testing framework is that while it is a well-established, experimentally sound methodology, it takes time to execute and collect the data, requires knowledge of statistics to accurately interpret (not to mention know how long to run the test, how to set up the test, and to understand why you don’t just “watch the numbers until it gets significant”) and needs to finish before you can tweak and deploy your “winning” solution. The epsilon greedy algorithm is relatively self tuning based on the rules given to it, making it get closer to an optimization (assuming a static universe) relatively quickly**. One big argument against the epsilon greedy strategy is that it can mistakenly optimize based on a temporal correlate or something similar (e.g. your site shows breakfast and dinner ads at 8 a.m. – guess which one gets more clicks? That ad is then considered optimal and keeps getting shown past noon, into the evening until, finally, the dinner ads get enough clicks to flip it – but way later than is optimal). Maybe some good strategies are to reset/retest every X hours, or to decay older clicks against newer ones for a faster “flip” when such a pattern emerges.

My take is that if you don’t care about the statistical rigor or experimental soundness angles to optimization – and if you aren’t particularly keen on analyzing every tweak, covariate, and adjustment you make to your website (again, excluding Google and the big boys here who definitely do care), then the epsilon greedy algo is worth implementing. That is not a dig by any means – sometimes you care and sometimes you want this thing to get rolling as fast as possible, you will never revisit it, etc. If you are trying to sell a service to scientists or stats nerds, need the experimental rigor, need to optimize for the long term, expect to run tests at all times, and want to use as much data as possible to craft the ultimate presentation models (or whatever it is you are testing for) then you should probably be using the slower-but-steady A/B testing approach or something similar. As with most things – use the tool that meets your goals, but figure out your goals ahead of time for optimal tool selection.

In the end, I feel like the debate around the method to use consists of folks who are discussing and exploring some of the pros and cons of each approach without enumerating the actual uses. They mistakenly assume that both are used for exactly the same reason. While this is, at the broadest level (optimizing between choices) is true, the actual reasons behind the use of one over the other is vastly different. Hatchet versus lathe – both cut wood, but which one does it better?

* I say “almost” because in the discussions, many of them fail to point out simple errors others are making in assumptions. If they were statistically savvy engineers they would say things like “you should reset your test metrics every time you make a change, and this is true whether you use the epsilon greedy strategy or A/B testing”.

** I’m ignoring the cases where the groups have disgustingly similar conversion rates.

Here are some articles and reference pieces for your perusal:

  1. Original post “20 lines of code that will beat A/B testing every time
  2. Wikipedia article on Multi Armed Bandit problem  and the concept of the epsilon greedy algorithm
  3. Blog on why Bandit algos are better than A/B testing
  4. Discussion and debate around the original article on y-combinator
  5. Blog on some hard-knocks and learning around A/B testing in general
  6. Blog summarizing a cool article by Microsoft Bing folks on weird outcomes experienced with online testing:
  7. Actual MSFT article on weird A/B outcomes explained (PDF)
Dec 182011
 

When I was in graduate school hustling to get my dissertation data in order, it took me over a year to collect everything I needed. Granted, the research took place via both an alumni survey and in a social-services organization with each posing its own set of challenges.  Once I had the data it was entered into spreadsheets and loaded into SPSS and AMOS whereby I ran my analyses after a bit of cleaning. Today in my work we have an A/B testing platform that allows certain experiments to be run by changing configuration files in real time on the live system. For those of you out of the loop, A/B testing is the term used for running split test experiments (traditionally in marketing).  For me it’s a tool to manipulate a very small percentage of user experiences in order to measure the impact of potential system-wide changes (visual or algorithmic).  If the test works, the change gets rolled out to everyone. If the test fails, we try something else.

When I was collecting data in grad school, one of my primary concerns was making sure I had a large enough sample to ensure that after throwing out mistakes and goofballs, (always have a few special questions to detect those who blindly fill in the bubbles), I would have enough data points to detect differences in the populations. Additionally, one problem with survey techniques (and most experimental designs in the field) is that you never know if the people who respond are actually representative of the population you are trying to measure. To use a lame example, you may want to measure people of all dispositions, but it may be that only really happy people answer your survey, and that skews the results in some particular way. The beauty of the online experiments I am running is that the user often doesn’t know they are in a test or that tests even exist.  This cuts down on the goofball and selective respondent issues. Also, in the online world, getting the needed sample size is the least of my worries. In fact, I can gather enough data in 5 minutes to fuel 10 dissertations. My biggest concern falls in the representativeness category – is my sample representative, and how can I help this?

First, by gathering a lot of data I am more likely to approximate the total population of users. A question I am often asked is how many people need to be in the study for it to be valid or for us to collect enough data. The answer is not straightforward (it depends on the effect size) and I would argue that it doesn’t actually matter because we get responses from thousands of users every hour. In other words, I could take the time to figure out the minimum necessary number of unique users needed for a high degree of confidence, but in the time it takes me to do the calculation we will have gathered more than that many users’ responses to the test in our logs.  No, my biggest concern is not the number of users needed. My concern is actually around temporal effects.

Seasonal behaviors are well known (retail sales soar in late November, etc.) to the public. What is less well known is that mobile phone application usage behavior shows regular temporal fluctuations by hour and day of week. The busiest time is in the evening, and weekends have higher traffic volume than weekdays. I don’t know if the makeup of weekend users matches weekday users, or if 7am users are similar to 9pm users. Because of this I want to get a good swath of time when running my tests, so that the effects of day of week and time of day inform the overall results evenly. Think of it like this: if I run an experiment for five minutes  because it gets me 10x the data I will need, how can I be sure that it was a “good” or “representative” five minutes? Is 10:00 – 10:05 on Sunday night the same as 3:15 – 3:20 on Friday afternoon? The only way to know is to test those time periods after collecting the data, but that is beside the point. Unfortunately, until I know better, my belief is that a proper A/B test should run against a small percentage of randomly selected users for 28 days. This gives several daily cycles and four example of each day of the week. The good news is you can run several tests on incredibly small populations over the month and still get a heap of data for each. The bad news is you have to wait a while to answer your questions. As I run more tests I may find that 28 days never gives better data than 14 or 7 or any given 10 minute period. Until then, I will stick with the slow and steady methodology. I’d love to hear other’s experiences regarding appropriate time periods for running online experiments.

Apr 202011
 

As a business analyst, I live and die by logging. This makes me vigilant about what products are being developed by my organization, and how they change from concept to wireframes to implementation. Rarely do these three stages look the same, and sometimes the end product is a far cry from the original beast due to time pressures, build vs. buy decisions, scope creep, and a number of other fun issues. Regardless of my vigilance, I find that logging, and thoughts around instrumentation almost always come last. I am not alone in my observations as other analyst friends have made the same comment. In fact, this was verified by a development lead at a large organization recently when he commented to me “you know, we always wait until it’s too late to add logging, if we even consider it in the first place.”

Why is it that engineers have such an aversion to extended, non-performance instrumentation, and find it so onerous or unimportant? They write unit tests. They instrument for speed of throughput, heartbeat, and error messaging but tend to ignore the basics of user behavior on the products they have built.  It is seen as extraneous, performance impacting, nonsensical even. This is unfortunate.

When I was in graduate school my dissertation focused on how individual’s beliefs about the degree to which their organization in general, and their supervisor specifically, impacted their work behaviors.  In other words, if you think your supervisor cares about you as a person, does that make you work harder? What about your overall organization – does that matter? Are there special traits of supervisors that make you more or less likely to do your job well, to help others, to protect the organization from lawsuits or other problems, to decide to stay instead of quitting?  It took me almost 2 years to collect enough data to answer this set of questions. Two years. Today, I can ask interesting, in-depth questions about the data I collect every 2 minutes. The only reason this is possible is because the damn products are instrumented like mad to tell me everything the user is doing, seeing, interacting with (and choosing to ignore). This information is powerful for understanding usability, discovery, annoying product issues like confusing pages or buttons. Predictive analytic models can be built off of this behavior (user X likes this stuff, hates that stuff, buys this stuff, ignores that stuff etc…) but only if it is logged. With both a strong BI opportunity and predictive analytics opportunities, why is logging so often ignored, perfunctory, or offloaded to companies like Google – almost as an afterthought?

My theory is that because the nuances of logging often make it fragile and complex, it isn’t easy to determine if it is accurate when in development. As the underlying systems change – whether that be schema shuffling or enumerated value redefinition (or recycling) for example and many hands are touching the code that creates the product, it makes sense to wait until things settle down to begin adding the measurement devices. Unfortunately, there are often special cases introduced – invisible to an end user, but obvious under the hood that makes straightforward logging difficult. The end result is often a pared down version of logging that is seen as “good enough” but not ideal. The classic “we’ll do this right in vNext” is my most hated phrase to hear.

The workaround to this malady, when possible, is to introduce clear, concise, standardized logging requirements that engineers can leverage across products. Often a block of specific types of values (timestamp, screen size, operating system, IP, user-id, etc) describe a majority of the values the analyst needs for pivoting, monitoring, etc. the remaining portion of a schema can then contain the pieces that are unique to the specific product (like “query string” if searching is a possible action in one product but not others).

The analyst must be vigilant, aware, engaged, and on the lookout for implementations that introduce actions or behaviors that are currently unlogged or that break expectations so that he or she can engage engineers proactively, before it’s too late, to add functionality to logging and be sure that important and essential user behavioral data does not go down the tube of the dreaded “vNext”.