Sunday, November 25, 2007

A Shortlist of Topics

After much reading and pondering, here's a shortlist of potential topics that nicely relate to text mining and knowledge management (Some references are missing):

1- Investigate the problem of quantification (Forman, HP Labs) in text classification on a specific domain. Useful in estimating positive cases, concept drift etc.

2- Improving classification performance with features from discourse analysis. To be used for the classification of text as discourse patterns (e.g. descriptive, dialogue, interview... potential use on searching for specific "kinds" of text)

3- Investigate feature selection on clustering of text data sets with application in trend analysis

For that I'll need:

- text data set from a domain (some available at ACM SIGKDD)
- a tool that implements tweakable classifiers (WEKA?)
- a text mining/persing/preprocessing tool (GATE?)

Next steps are to investigate those tools, and how decent a data set can I gather on the short timeframe available.

0 comments:

Post a Comment