Evolution of Analysis - Microsoft's NetScan and Project Aura

Data mining, OLAP and reporting combined!

By Hari Mailvaganam

Microsoft has recently released a new online reporting application, call NetScan. NetScan is an combined reporting, OLAP and Data Mining application for analyzing Usenet posts. According to Microsoft the goal of NetScan is to analyze Usenet posts, frequencies, e-mail addresses of posters, trend analysis, value of the message posted and eventual creation of a better search engine.

The rational for Microsoft to implement NetScan is not altruistic. Microsoft hopes to enhance community support for its products. Usenet is being looked at as an extension of Microsoft's product support. For some years now Microsoft has a team of dedicated support engineers whose occupation is to monitor and respond to news groups. 

NetScan is an early version of Project Aura. Aura foresees the future where consumers will have handheld devices that will be able to obtain the rating of almost any object, software, movie, artistic event. Consumers will be able to scan the item of interest through a bar code reader and the rating will be displayed.

To grasp the nature of Aura, lets take a step back and evaluate NetScan. By looking at NetScan we can see the potential of Aura and also evaluate the implications for personal privacy.

NetScan's goal is to abstract a layer of order upon the chaos of Usenet posts. Many of the features of NetScan are derived from Text Mining. NetScan goes further and also rates the "value" of a poster to the news group or discussion thread. And also introduce a time series analysis of a poster's "value" to a discussion - an ability to judge an expert's knowledge in categories. An example : A poster may be a high "value" contributor to the news group soc.politics. NetScan will be able to differentiate the poster's expertise in the sub-topics found in soc.politics - is the poster an expert in "gun control" discussions or does the expert have "good contacts in Washington".

NetScan intends to evolve from extracting data only from news groups to also correlating information gathered from e-mail, e-mail lists, chat rooms, buddy lists, instant messages, message boards and blogs.

Evolution of NetScan to Aura

Figure 1. Evolution of Continuous Analysis from NetScan to Aura

While NetScan intends to produce analysis of news groups and posters, Aura takes this a notch further and serves analysis on any object or event. Aura can be though of as a Amazon-type product rating systems for the non-virtual world. Aura will be less open to abuse than the Amazon rating systems as the sample set is much larger.

But already there are open source applications being designed to create false personalities and place automated opinions of events and products. An application, Kiara, I helped develop was originally designed for the creation of a pseudo gamer for multi-player online games. It would not take much work for a public relations firm to convert this to virtual opinionated characters making multiple posting and chats to influence NetScan and Aura.

The technology behind NetScan and Aura

NetScan and Aura can be thought of an always on, pervasive, 360 view analysis - at least this or similar terminology will be used by marketing groups. And they would not be too far from the truth if they did.

NetScan will be constantly extracting data from Usenet - posters name and contact information, message body text. These data are then parsed and cleansed. Text mining is performed on the message body for classification and clustering. These results are fed into decision trees to evaluate a poster's value. Time series analysis is performed to give a trend to a poster's value and subject area expertise.

NetScan Screenshot

Figure 2. NetScan's Thin Client Reporting - Rich in Features

OLAP reporting is used in conjunction with data mining to produce results in a readable manner. NetScan's reporting screen is very impressive and rich in features. It will be even better when XML GUI is used and the reporting is decoupled from the middle layer server logic.

Privacy Concerns

Usenet postings are considered to be public property. As such the data gathered is open to analysis and can be correlated to information gathered from elsewhere. If you are concerned about privacy, the best methods to avoiding identification is having multiple pseudonyms and e-mail addresses for making online posts.

Read the fine print when you register for online message boards and chats - they may sell all data collected for analysis. Data collected from third parties such as Air Miles, credit card companies, Dun & Bradsheet, can be further use to train the data mining results and provide stong predictive trend analysis.

Microsoft guarantees that privacy concerns will be addressed, although these have not been elaborated in detail as much of  NetScan and Aura are works in progress.

