By Hari Mailvaganam
Data mining is conducted against data accumulated in OLTP repositories, data warehouses, data marts and archived data. The steps for data mining follows the following pattern:
applying data mining algorithm
Data extraction and data cleansing can be eased with good data lifecycle management policies. Very often a data warehousing project will ensure that data extraction and meta-data standards are pre-defined in an organization.
Data models for operational and archived data are different from data mining models. Data stored referentially in operational systems are designed for transactional speed.
Figure 1. Data Extraction for Data Mining
In data mining a unified table view is created where data of interest is stored. Most data mining vendors offer the ability to extract data from repositories and transfer to the data mining database.
The table view below shows an example for an retailer's data mining database.
Table 1. Table View of Data Mining Database
Not all of the data found in the data mining table view will have relevance. An example is the first column in Table 1, which has identifier values.
Other data may hold hidden patterns that can be discovered after relevancy is captured, often with external data sources. An example from Table 1 is the telephone number column. At first glance it would seem this data set would be insignificant to the data mining process. However, useful information can be obtained from telephone numbers, such as telephone exchange location or cell phone usage. This can be obtained from external data sources which will grade and score the telephone number data set in Table 1.
Figure 2. The Steps taken in Data Mining
Please contact us if you have any questions or suggestions.