.
|
|
The Linguistic Classification System
LCS
The Linguistic Classification System LCS is a basic component for all
applications involving document classification, performing the following
tasks:
- learning a classifier from pre-classified train documents
- applying a classifier to given documents to obtain a ranking of all
documents for all categories
- making, based on a computed ranking of documents for categories,
an optimal assignment of documents to categories, according to a given
utility function
- performing mono and multi-classification, as well as hierarchical
classification.
Other classification systems are available, often in the public domain,
but only as naked classification engines. LCS not only
provides a number of classification engines, but these are embedded
in a framework which supports the classification process in practical
situations:
- de-novo training of classifiers
- incremental training and feed-back training
- multi-classification
- hierarchical classification
- cross-lingual classification
- selection according to different utility functions
- quality monitoring and reporting
The LCS system differs from others in that it can make use of linguistic
terms (Dependency Triples) to enhance its accuracy further.
It has a proven track record in the classification of patent documents.
Some publications about the LCS system
- C. Peters and Cornelis H.A. Koster (2002),
"Uncertainty-based noise reduction and term selection in text
categorization",
Proceedings 24th BCS-IRSG European Colloquium on IR Research (ECIR
2002), Springer LNCS 2291, pp 248-267.
ps.gz pdf
Copyright Springer verlag
- Cornelis H.A. Koster, Paul Jones, Merijn Vogel and Nico Gietema,
"The Bootstrapping Problem",
presented at the SIGIR 02 Workshop on
Operational Text Categorization, Tampere, August 2002.
ps.gz pdf
- C. Peters and Cornelis H.A. Koster (2003),
"Uncertainty-based Noise Reduction and Term Selection in Text
Categorisation",
International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems (IJUFKS) Vol. 11, No. 1, pp 115-137.
ps.gz pdf
- Cornelis H.A. Koster and Marc Seutter (2003),
"Taming Wild Phrases",
Proceedings 25th European Conference on IR Research (ECIR 2003),
Springer LNCS 2633, pp 161-176.
ps.gz pdf
Copyright Springer verlag
- Nuria Bel, Cornelis H.A. Koster and Marta Villegas (2003),
"Cross-Lingual Text Categorization",
Proceedings ECDL 2003, Trondheim, August 2003,
Springer LNCS 2769, pp 126-139.
ps.gz pdf
Copyright Springer verlag
- Cornelis H.A. Koster, Marc Seutter and Jean G. Beney (2003),
"Multi-Classification of Patent Applications with Winnow",
Proceedings PSI 2003, Springer LNCS 2890, pp 545-554.
ps.gz pdf
Copyright Springer verlag
- Jean G.Beney and Cornelis H.A. Koster (2003),
"Classification supervis\'ee de brevets: d'un jeu
d'essai au cas r\'eel",
Proceedings of the XXIeme congre`s Inforsid, pp.50-59.
ps.gz pdf
http://inforsid2003.loria.fr/ActesWorkshopRI.pdf
(last visited October 2005).
Requests for information can be directed to
Cornelis H.A. Koster
Department of Computing Science
University of Nijmegen
6525ED Nijmegen, The Netherlands
tel: +30.24.3653411
fax: +30.24.3553450
mail me at kees atsign cs.kun.nl