By Hari Mailvaganam
Closed-loop data mining has gained popularity with the onset of OLE-DB for data mining. With OLE-DB software developers can embed data mining functionality directly in the application source code.
Data mining is used for predictive and descriptive analysis. Until recently data mining was a time consuming effort conducted separately from enterprise applications. Typically data mining is an iterative process conducted after the event has occurred. Analysis is performed on the data to discover events of interests and associated reasons.
Figure 1. Iterative Data Mining Process
The start for iterative data mining is to identify the business metrics to analyze. This can be a consultation process between the finance department and data mining consultant in an example of analyzing for fraud detection. The exact metrics for identifying fraud will be defined in advance to data mining. The biggest challenge for most data mining processes is then to extract the relevant data and compile in the data mining database for analysis. The data may also have to be cleaned for clarity and context.
Once the data has been corralled, the process of parsing the data through data mining algorithms begins. The results obtained from the initial parsing of data through the data mining algorithms is call the training set. The training set is used as a basis for optimizing future data mining result sets. By having a number of result sets from the data mining process, the consultant will manipulate constrains to determine the margin of errors and minimize false results. This process is typically conducted with the business sponsor's input.
After arriving at an optimal tolerance point, the fraud analysis results are produced for the Finance Department.
The data mining software used in such a scenario are normally stand-alone applications with a distinct data repository to the business application. The business applications are designed for transaction speed and not decision support analysis. Data extraction and mapping between the two data repositories is tedious and inflexible.
With the arrival of enterprise applications, ERP, CRM, SCM, business processes in organizations are being better defined. The next generation of enterprise applications have a stronger focus on analysis. With the embedding of data mining and OLAP modules directly on the database by the vendors, the enterprise application can run analysis directly from the source code.
In a typical enterprise application, the business logic has tight integration with analytical layer. Data stored in the OLTP are clean and contextual. Older data are migrated to the data warehouse which maintains the business context and referential integrity of the data.
In such a scenario, the end-users can be presented with business logic which offers closed-loop data mining. Users or automated business constrains can trigger the data mining process.
The data warehouse receives feeds from the OLTP. At intervals, the appropriate data in the data warehouse is balanced and audited against OLTP data, using new code functions and scheduled on-request jobs. It's then aggregated and stored in other data warehouse tables designed for analysis, in accordance with specifications provided by the business process. The data mining results are returned directly to the OLTP. If the data mining analysis detects a result of interest, an alert is prepared and sent to the end-user.
Exceptionally good enterprise software design with tight business process management can make closed loop decision support a powerful competitive tool. The enterprise application implementers must have good OLTP programmers who know the OLTP transaction system inside out, including its limitations and its extensibility for handling automation for these types of requirements. Ideally there should be a pre-implementation period of two months to perform parallel processing and comparison of manual versus automated analysis.
Examples where closed loop data mining are used in practice includes a major wireless telephone company based in the United States. The data warehouse has feeds from numerous data sources including the call center. Churn analysis is performed against targeted data and the results produces clients labeled as "valuable and at risk to defect". These results are directed to the call center's integrated telephone system. Sales representatives are directed to call the identified clients offering promotional offers.