Data Modeling and Mining

Modeling for data mining

By Hari Mailvaganam

Data mining is conducted against data accumulated in OLTP repositories, data warehouses, data marts and archived data. The steps for data mining follows the following pattern:

  • data extraction
  • data cleansing
  • modeling data
  • applying data mining algorithm
  • pattern discovery
  • data visualization

Data extraction and data cleansing can be eased with good data lifecycle management policies. Very often a data warehousing project will ensure that data extraction and meta-data standards are pre-defined in an organization.

Data models for operational and archived data are different from data mining models. Data stored referentially in operational systems are designed for transactional speed.

Data Extraction for Data Mining

Figure 1. Data Extraction for Data Mining

In data mining a unified table view is created where data of interest is stored. Most data mining vendors offer the ability to extract data from repositories and transfer to the data mining database.

The table view below shows an example for an retailer's data mining database.

Data Mining Table - Example

Table 1. Table View of Data Mining Database

Not all of the data found in the data mining table view will have relevance. An example is the first column in Table 1, which has identifier values.

Other data may hold hidden patterns that can be discovered after relevancy is captured, often with external data sources. An example from Table 1 is the telephone number column. At first glance it would seem this data set would be insignificant to the data mining process. However, useful information can be obtained from telephone numbers, such as telephone exchange location or cell phone usage. This can be obtained from external data sources which will grade and score the telephone number data set in Table 1.

Data Mining Steps

Figure 2. The Steps taken in Data Mining

Please contact us if you have any questions or suggestions.