Tag Archive for 'Data Management'

Census Changes in Canada Will Jeopardize Data Quality

Tom Exter, Ph.D., Chief Demographer, Pitney Bowes Business Insight

Recently, the Canadian government announced its decision to eliminate the traditional long-form Census questionnaire with a voluntary National Household Survey (NHS) in conjunction with Census 2011. While some supporters in the government agree with the change, the news has garnered backlash from demographers, geographers, statisticians and much of the population, including Canada’s Chief Statistician, who has resigned his post due to the Minister’s decision.
 

As a professional demographer with Pitney Bowes Business Insight in Toronto, I have used Canadian census results for the past 12 years, especially those generated by the long-form census questionnaire. Without the long-form census sample, valuable information used in both the public and private sector will be lost. In addition to the arguments for reinstating the long-form census presented by many Canadian organizations and professional societies including the Canadian Population Society, I would like to contribute the following considerations:

  • A voluntary survey, such as the proposed National Household Survey, would not be a sufficient alternative to the mandatory census sample survey. The traditional one-in-five household sample provides good information for every neighbourhood in Canada. In contrast, information from a voluntary sample survey would be biased, even at the provincial and national level.
  • A voluntary sample survey would have a much lower response rate, relative to the mandatory long-form census, and those who do respond would be, by definition, self-selected. Using information from a self-selected sample of unknown and really unknowable bias in health care planning, for example, would have adverse impacts on health care delivery in Canada.
  •  Filling out the long form may be onerous, but it is not an “invasion of privacy.” The rigorous confidentiality standards of Statistics Canada actually protect the privacy of Canadians because the individual responses are highly protected and only used in privacy-friendly ways (aggregated to relatively large geographic boundaries, for example) to generate information for businesses and government agencies.

Overall, the long-form census data are a significant contributor to the Canadian economy in both the private and public sectors. Businesses rely on census information to grow and help their customer base. Government agencies plan the delivery of services and the allocation of funds to government programs. The quality and utility of the long-form census data are also a testament to the highly professional staff at Statistics Canada who collect, compile, analyze, and disseminate the data to businesses and communities alike.

The significance of this decision for all users of Canadian demographic data cannot be overstated. Readers are encouraged to voice their concerns directly by writing to:
The Honourable Tony Clement
Minister of Industry
House of Commons
Ottawa, ON
K1A OA6

PBBI Canada is interested in your perspectives and questions as well. Please address them to tom.exter@pb.com.

Keeping Predictive Models Current: Dealing with Continuous Change…Continuously

by Nat Evans, Pitney Bowes Business Insight

Most contemporary predictive models, which forecast performance such as sales, customer visits, membership levels, etc., are based on historical data that create “snapshots in time,” using whatever relevant sources were current at the time of analysis. Examples include POS distributions, store and competitive locations, store sales performance and demographic data. But we know operations and the environment changes as soon as a model is completed and put into use. As a result, model accuracy erodes with each passing day as the data inputs into the model or the benchmarks upon which expected performance are based become stale. To be sure, most site selection professionals and researchers attempt to make sure models are as fresh as possible, updating these data elements on a regular and recurring basis. During recent engagements with several long time clients, we have been asked if there was a way to take into consideration dynamic time series data elements to help with forecasting and minimizing risks.

What do we mean by dynamic data?

Many factors may play pivotal roles in retail forecasting and market prioritization. Depending on the level of aggregation, the obvious thought is that a researcher may be able to affect a change in market conditions or individual sales estimates, depending on the application. Indeed, they can significantly sway analyses enough to change even the simplest of decisions, either minimizing risks (if used appropriately) or increasing a company’s vulnerabilities, especially given the current macro-economic climate.
A couple of sources of dynamic data within the context of a static model may include:

• Macro-economic data such as housing starts, CPI (consumer price indices), funds rates, and unemployment percentages either nationally or at varying levels of macro geography – state, county, or CBSA. Such measures provide a look into the health of consumers’ collective behavior, and depending on how the analysis is structured, whether these factors will be leading or lagging indicators of retail growth and consumer spending (PBBI has created an approach-MarketPulse-that incorporates these factors into predictive models).

• Gas prices. Gas price fluctuations on a regional or even local level can create a similar effect that macro-economic variables may produce in models. Obviously, the higher gas prices rise, the less disposable income consumers will have to purchase goods and services, potentially depressing actual local store performance. Distance may become a stronger deterrent to patronage as a result.

If a retailer’s or restaurant’s sales forecast model was created in better times, it may produce a “false positive,” inappropriately triggering a go/no-go decision and costing company valuable resources and capital from other locations that may be more profitable. Just as importantly, if a company is judging a general or district manager on existing location(s) sales performance based on a projection created earlier in the fiscal year, the company may be unduly influencing that leader’s performance rating on factors outside of his or her control.

How can we create more flexible models using dynamic data?

There exists a myriad of ways we can leverage dynamic data through any forecasting or analytical process, more generally. The important point with any data source is to leverage any and all relationships that may prove fruitful through the forecasting process. But, it must be relevant to your research design, have purpose, and be significant enough to warrant using in modeling and analytical review.

In the future, the ability to collect and cleanse data continuously not only from existing, well-documented sources, but also new sources, such as e-commerce and online social/behavioral data, will become more available and increasingly important across any organization. Additionally, whether on-premise or in the “Cloud”, the technology that facilitates a seamless data flow into predictive applications should enable decision-making with the most up-to-date analysis possible.

Unifying the Practices of Data Profiling, Integration, and Quality (dPIQ)

Pitney Bowes Business Insight and TDWI have joined forces to bring you this webinar by Phil Russom on unifying the practices of data profiling, integration, and quality (dPIQ):

Presenter:  Phil Russom
Date: October 13, 2009
Time: 12:00pm EDT
Location: Online

Data profiling, data integration, and data quality go together like bread, peanut butter, and jam, because all three address related issues in data assessment, acquisition, and improvement. Because they overlap and complement each other, the three are progressively practiced in tandem, often by the same team within the same data-driven initiative. Hence, there are good reasons and ample precedence for bringing the three related practices together. The result is an integrated practice for data profiling, integration, and quality (dPIQ).

You will Learn:

  • Relationships among the related dPIQ practices of data profiling, integration, and quality.
  • The iterations and cycles of dPIQ practices, and possible ways to align these.
  • Why you should tightly coordinate projects that involve the related dPIQ practices of data profiling, integration, and quality.
  • Ramifications for staffing, project management, and release cycles.

Register now!