Canadian Government Executive - Volume 23 - Issue 07

18 / Canadian Government Executive // October 2017 The solution is to explore and map out available data sources and understand the story the individual elements contribute to the big picture. Most data sets come with notes explaining the weighting of data, and what methods were used to collect and calculate the results. Big Data by Sidney Shapiro and Vivian Oystrick T he recent election results in the Unites States left many sur- prised. The pundits and analysts had predicted a victory for one candidate based on polls, social media, and seemingly objective data, but what they didn’t consider was the many individ- uals who were not accounted for. This is an example of what happens when social media is used to filter content to what a user wants to see. The data is significantly narrowed in focus and its utility for analy- sis becomes limited. There often exists a large gap between the big, macro picture, and the many data points, which feed into it on the micro scale. Although analyzing patterns in “big data” has become an es- tablished analytical method, underlying inputs need to be chosen with care. There are many other examples of how flawed assumptions based on incorrect data leads to problematic decision-mak- ing. Inaccurate estimates on crop yields in Canada and the United States, for ex- ample, led to policy shifts in agriculture, whichwere later reversed. In another case, the Canadian Real Estate Association’s statistics were questioned because of the way data was collected and reported, cast- ing uncertainty on the numbers that were presented. Systematic or random flaws in data used in decision-making may initially seem accurate, but only following care- ful review can weaknesses be uncovered. With a push towards data driven decision- making and evidence based policy, data accuracy and quality have become more important than ever before. While there are tools, such as program logic models, that can provide a big pic- ture of the various program inputs and outputs, and how they link to short and long-term results, the problem occurs when there is an underlying flaw in the data being collected. When this happens, the results can be both skewed and mis- leading. This is true on both limited scope analyses as well as large scale, big data analysis that uses millions of data points. Decision-making based on incomplete or When Big Data Gets Small Results Decision Making Using Flawed Assumptions missing data can be disastrous. As the pro- gramming adage goes, garbage in, garbage out. In other words, the inputs you base your assumptions on can lead to unpre- dictable results if quality controls are not used. Understanding the quality of the raw data that your analysis and assumptions are being based on is critical to determin- ing the quality and validity of your analyti- cal outputs. Statistics Canada defines data quality in terms of six elements or dimensions, which include: relevance, accuracy, time- liness, accessibility, interpretability, and coherence. While no one factor takes pri- macy, a combination of these elements can be used to determine data quality. These factors are important elements to ensuring decisions are based on quality data that makes sense in the context of the question. Each of these elements play a role in de- termining data quality: • The relevance of data relates to how well the data can explain the question under study. Data, which is not related, may seem to make a case, but can be fundamentally flawed. Ensuring that the data being used actually refers to the problem being studied is an important first step to gaining insight into the issue under study.

RkJQdWJsaXNoZXIy NDI0Mzg=