Wednesday, July 27, 2005

ConcepTool Restrictions

I have found an issue with ConcepTool that works against some of my design decisions. I've designed the learner system to allow the user to specify any number of files, folders or web pages, and for folder recursion and link following to be used to reach as many documents as you like.

There were 2 motivators behind this approach:
1) You can create a desktop search engine index with ontologies by specifying "C:/" or "/" as the folder to learn from, including recursion to all subfolders.
2) You can learn from an entire website by turning on link following.

I've been testing the effectiveness of various learning heuristics, and found that learning from just 20 documents is enough to cause problems. The learning process itself only takes a couple of minutes, but the process of loading the CT XML file into ConcepTool grinds away for about 30 minutes before running out of memory. Since the XML file it made was only 2.5 meg (compared to about 30k for single files), there's obviously a lot of bloating going on in ConcepTool that means it's pretty unusable with that many documents.

It's just as well this is a proof of concept and not a commercial venture.

0 Comments:

Post a Comment

<< Home