Sunday, November 25, 2007

Visual Complexity

Stunning array of projects in knowledge visualization, many of them exploring techniques for visualizing textual data and its relations to other data sources. A follow on from many mining exercises, it opens up far wider choices for humans to relate to the original raw content.

A Shortlist of Topics

After much reading and pondering, here's a shortlist of potential topics that nicely relate to text mining and knowledge management (Some references are missing):

1- Investigate the problem of quantification (Forman, HP Labs) in text classification on a specific domain. Useful in estimating positive cases, concept drift etc.

2- Improving classification performance with features from discourse analysis. To be used for the classification of text as discourse patterns (e.g. descriptive, dialogue, interview... potential use on searching for specific "kinds" of text)

3- Investigate feature selection on clustering of text data sets with application in trend analysis

For that I'll need:

- text data set from a domain (some available at ACM SIGKDD)
- a tool that implements tweakable classifiers (WEKA?)
- a text mining/persing/preprocessing tool (GATE?)

Next steps are to investigate those tools, and how decent a data set can I gather on the short timeframe available.