DEVELOPING A TOOL FOR TEXTS WITH HETEROGENEOUS STRUCTURE PROCESSING
Abstract
This article presents our approach in the elaboration of the system for processing Romanian unstructured text data. The project aims to elaborate the SoFTcrates tool, a software system for processing unstructured text
data in order to create structured data output as computer linguistics resources. We described some mathematical aspects in text representation and presented some stages in unstructured text data processing. Also, the interface of the
application is illustrated. In the future we will try to implement mechanisms of diversification of the founded words by
means of derivation and WordNet semantic net. More over we will optimize the interface to have the possibility to find
not only by a single word, but also by several words that the user consider more relevant to the text.
