Data Warehousing and Data Mining
Analytics, Business Intelligence (BI) and the exponential increase of insight and decision making accuracy and quality in many enterprises today can be directly attributed to the successful implementation of Enterprise Data Warehouse (EDW) and data mining systems. The examples of how Continental Airlines (Watson, Wixom, Hoffer, 2006) and Toyota (Dyer, Nobeoka, 2000) continue to use advanced EDW and data mining systems and processes to streamline their business models are a case in point. The greater the level of economic uncertainty, perceived and actual risk in any given strategy or endeavor, the more the reliance on EDW, data mining and advanced forms of predictive modeling including analytics (Sen, Ramamurthy, Sinha, 2012). From this standpoint, the emerging areas of high growth in the global economy are attracting a high level of investment in EDW, data mining, predictive modeling and analytics. The latest figures illustrate how valued EDW and data mining are in enterprise today. According to industry research and advisory firm Gartner, the EDW and data mining market began 2011 with a global value of $23.2 billion with a projection of market growth of 7% per year through 2015, making it one of the largest and perennially growing enterprise software market (Sen, Ramamurthy, Sinha, 2012). Gartner has defined the EDW and data mining architecture as being comprised of the architectural design, repository and execution platform. These three core components are how this research and advisory firm analyze the market from a software component standpoint, looking at the relative adoption of each EDW and data mining component (Sen, Ramamurthy, Sinha, 2012). The intent of this analysis is to evaluate the benefits and current trends in EDW and data mining, evaluating Continentals’ and Toyota’s best practices and results achieved. Additional objectives include an assessment of EDW and data mining optimization techniques, recommendations for storage solutions and an analysis of a potential EDW process workflow predicated on a Customer Relationship Management (CRM) system.
Benefits and Current Trends in Data Warehousing and Data Mining
The greatest benefit that is being accrued from EDW and data mining is the ability to gain greater insights into the customer, distribution channel, supplier and stakeholder processes, systems and metrics over time. A highly effective EDW and data mining strategy will bring together a myriad of systems that had been disconnected, even siloed, throughout an enterprise. The Master Data Management (MDM) applications that are part of an EDW architecture or platform deliver what many enterprise lack previous to this point, which is a single system of record for all transactions and enterprise-wide activity (Fay, Zahay, 2003). Having a single system of record leads to more synchronized, unified strategies across an entire enterprise and often leads to corporate cultures becoming more galvanized around specific goals, objectives, while also measuring their relative performance with much greater accuracy and insight (Watson, Wixom, Hoffer, 2006). Based on the solid foundations of an MDM architecture and the ability to mine data across the enterprise, enterprises often accelerate their use of many different forms of analytics, from predictive modeling to data visualization including modeling customer interactions to see how pricing and product promotions will impact sales and profitability (Fay, Zahay, 2003). This leads to one of the most critically important advantages of using EDW and data mining applications, which is the ability to predict overall corporate performance based on the impact of demand-driven, customer-driven and supply chain-based factors and determinants of performance (Watson, Wixom, Hoffer, 2006).
Relying on this integration of customer, distributor and supplier EDW and data mining applications once integrated to financial systems have been delivering insights that continue to revolutionize the financial management of firms (ABA Journal, 1999). The trend begun over fifteen years of integrating customer, distribution and supply chain data into a single system of record to drive greater insights into an enterprise and therefore attain a greater clarity of decision making is a pervasive best practice across many enterprises today (Brachman, Khabaza, Kloesgen, Piatetsky-Shapiro, Simoudis, 1996). The trend has accelerated in the last three years to include Big Data and the use of Hadoop and MapReduce to better analyze and interpret large-scale data sets that are outside the range of calculating and analysis levels of existing mainstream EDW systems and platforms (Sen, Ramamurthy, Sinha, 2012). The advent of Big Data is already beginning to reshape how decisions are made in enterprises, even though this specific technology, considered part of the highest-growing areas of EDW and data mining, is nascent and in the first phase of its industry lifecycle (Sen, Ramamurthy, Sinha, 2012). One of the most promising aspects of this trend is the development of more comprehensive and strategic Business Intelligence (BI) platforms as shown in Figure 1, The Impact of EDW and Data Mining Systems on CRM Analytics. The integration of these factors will continue to have an additive effect on the level of insight and intelligence enterprises will be able to use over time.
Figure 1: The Impact of EDW & Data Mining Systems on CRM Analytics
Based on analysis of the following sources: (Brachman, Khabaza, Kloesgen, Piatetsky-Shapiro, Simoudis, 1996) (Fay, Zahay, 2003) (Fong, Wong, 2002) (Marks, Frolick, 2001) (Sutherland, 2003)
Of the many companies that are successfully using EDW and data mining today, two of the more noteworthy ones are Continental Airlines (Watson, Wixom, Hoffer, 2006) and Toyota in their world-famous supply chain management system, the Toyota Production System (TPS)(Dyer, Nobeoka, 2000). Both of these company’s best practices are briefly described in this analysis starting with the impressive use of EDW, data mining and predictive modeling completed by Continental. The well-known airline had recently emerged from bankruptcy, and one of the more valuable lessons learned from that experience was how disconnected the entire enterprise was from its customers. In 1988 Continental launched its Go Forward Initiative, which sought to unify the diverse and disparate systems across the company, bringing together the foundation of its MDM and EDW architectures. Spending $30M for the entire series of software, hardware and systems and generating a $500M increase in combined cost savings and revenue growth, Continental hailed this IT strategy as the most successful in the company’s history (Watson, Wixom, Hoffer, 2006). Using this system the company was able to, for the first time, integrate their many airline and maintenance operations processes with CRM, accounting, finance and security systems to ensure real-time 360-degree views of the customer and the entire operations process. This ensures that flight staffing, planning and gross margin analysis are all synchronized to a common set of metrics and key performance indicators (Watson, Wixom, Hoffer, 2006). Continental went on to optimize their entire operations using the lessons learned regarding analytics and metrics from their investments in EDW and data mining, trimming entire routes and increasing customer satisfaction and profitability in the process (Watson, Wixom, Hoffer, 2006). Another company who has successfully transitioned from being managed to metrics that lagged the performance of their supply chain, services and had a corresponding impact on their new product launch strategies is Toyota and the TPS system they use for global procurement and supply chain management (Dyer, Nobeoka, 2000). Toyota has taken the approach of quantifying overall supplier performance, rank-ordering it on a series of dashboards and scorecards, and using Six Sigma, evaluate overall performance across all product lines. The intent of this is to ascertain how best to augment and increase the knowledge across the organization to transform it into a competitive asset (Dyer, Nobeoka, 2000). It is a best practice to use an EDW and data mining platform to create a knowledge management system, coupling quantified metrics and those that are more aligned to process-based performance (Sen, Ramamurthy, Sinha, 2012). The TPS system can manage a new product introduction in as little as 26 weeks’ time when necessary, by far the quickest in the auto industry (Dyer, Nobeoka, 2000). It can also troubleshoot quality to the production lot literally in seconds. The lapses in product quality that occurred in the last few years on core product liens, Toyota was able to use the analytics of the TPS and troubleshoot down to the factory level, finding plant managers had neglected to use Six Sigma incoming inspection practices to track overall quality (Dyer, Nobeoka, 2000). This example is admittedly unique in that it blends the ability to manage knowledge while also creating a quality management system that is predicated on knowledge creation over pure compliance reporting.
Data Warehousing and Data Mining Optimization Techniques
The emerging nature of EDW and data mining optimization techniques is emerging rapidly due to the increasing use of constraint-based modeling platforms (Sen, Ramamurthy, Sinha, 2012). Constraint-based modeling is predicated on a series of parameter sets that continually optimize the levels of selections made throughout an EDW including the MDM repositories as defined previously in this paper. The use of constraint-based modeling is making it possible for Web-based applications to populate and propagate analytics, predictive models and performance analyses across enterprises regardless of device. An effective optimization would be able to create an entire series of options for a specific strategic decision or option in real time and display the results across the enterprise literally within seconds (Sen, Ramamurthy, Sinha, 2012). The progression from rules-based to constraint-driven EDW and data mining modeling is also predicated on an entirely new class of presentation and Application Programmer Interface (API) technologies as well, all aimed at streamlining this aspect of EDW use in decision making and support (Sen, Ramamurthy, Sinha, 2012).
Recommendations for Storage Solutions
Faced with a rapidly growth data set of 20TB with a projected growth of 20% per year, the best possible solution is to choose a hardware platform purpose-built for EDW and data mining. The Oracle Exadata Database Machine would be the best possible option, as it support 160 CPU cores and 4 TB of memory for database processing, can be configured as two logical database servers and can support up to 224TB per server racks in a multi-rack configuration. The Oracle Exadata Database Machine would be ideally suited for this specific task, and it also has a series of bundles of Oracle EDW, data mining and predictive analytics included. It can also be configured as a private cloud server, offering MDM functionality and support across an entire enterprise (Stonebraker, 2011).
Analysis of Data Warehouse Workflows
Based on the study of EDW and data mining the following schematic has been defined based on an analysis of the Continental Airlines Go Forward Initiative (Watson, Wixom, Hoffer, 2006). The structure of this replicates how Continental had created a unique MDM platform to support enterprise-wide visibility of their operations.
Applications to “visualize” the data stream. (1999). American Bankers Association.ABA Banking Journal, 91(3), 56-56.
Brachman, R.J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., & Simoudis, E. (1996). Mining business databases. Association for Computing Machinery.Communications of the ACM, 39(11), 42-48.
Jeffrey H. Dyer, & Kentaro Nobeoka. (2000). Creating and managing a high-performance knowledge-sharing network: The Toyota case. Strategic Management Journal: Special Issue: Strategic Networks, 21(3), 345-367.
Fay, C.P., & Zahay, D. (2003). Understanding why marketing does not use the corporate data warehouse for CRM applications. Journal of Database Marketing & Customer Strategy Management, 10(4), 315-326.
Fong, J., & Wong, H.K. (2002). Online analytical mining of path traversal patterns for web measurement. Journal of Database Management, 13(4), 39-61.
Forcht, K.A., & Cochran, K. (1999). Using data mining and datawarehousing techniques. Industrial Management + Data Systems, 99(5), 189-196.
Lei-da, C., & Frolick, M.N. (2000). Web-based data warehousing. Information Systems Management, 17(2), 80-86.
Marks, W.T., & Frolick, M.N. (2001). Building customer data warehouses for a marketing and service environment: A case study. Information Systems Management, 18(3), 51-56.
Paas, L.J. (2009). Database marketing practices and opportunities in a newly emerging african market. Journal of Database Marketing & Customer Strategy Management, 16(2), 92-100.
Sen, A., Ramamurthy, K. (., & Sinha, A.P. (2012). A model of data warehousing process maturity. IEEE Transactions on Software Engineering, 38(2), 336-353.
Stonebraker, M. (2011). Stonebraker on data warehouses. Association for Computing Machinery.Communications of the ACM, 54(5), 10.
Sutherland, K. (2003). How to use your database or data warehouse of profitability information to make better marketing decisions. Journal of Performance Management, 16(1), 25-30.
Watson, H., Ariyachandra, T., & Matyska, Robert J.,,Jr. (2001). Data warehousing stages of growth. Information Systems Management, 18(3), 42-50.
Hugh J. Watson, Barbara H. Wixom, Jeffrey A Hoffer, Ron Anderson-Lehman, Anne Marie Reynolds. “Real-Time Business Intelligence: Best Practices at Continental Airlines.” Information Systems Management 23, no. 1 (January 1, 2006): 7-18.