- May 28: Additionnal information (timeline and tasks description)
- June 29 : 2009 Collection finally available
The goal of the challenge is to identify the different Machine Learning (ML) methods proposed so far for structured data, to assess the potential of these methods for dealing with generic ML tasks in the structured domain, to identify the new challenges of this emerging field and to foster research in this domain. Structured data appears in many different domains. We will focus here on Graph document collections and we are organizing this challenge in cooperation with the INEX initiative. This challenge aims at gathering ML, Information Retrieval (IR) and Data Mining researchers in order to:
- Define the new challenges for structured data mining with ML techniques.
- Build Interlinked document collections, define evaluation methodologies and develop software which will be used for the evaluation of classification of documents in a graph.
- Compare existing methods on different datasets.
Results of the track will be presented at the INEX workshop.
Task : Graph (Semi-)Supervised Classification
Dealing with XML document collections is a particularly challenging task for ML and IR. XML documents are deŻned by their logical structure and their content (hence the name semi-structured data). Moreover, in a large majority of cases (Web collections for example), XML documents collections are also structured by links between documents (hyperlinks for example). These links can be of different types and correspond to different nformation: for example, one collection can provide hierarchical links, hyperlinks, citations, etc.
Earlier models developed in the field of XML categorization/clustering simultaneously use the content information and the internal structure of XML documents for a list of models) but they rarely use the external structure of the collection i.e the links between documents.
We focus here on the problem of classication of XML documents organized in graph. More precisely, the participants of the task have to classify the document of a partially labelled graph.
If any problem, please contact Ludovic DENOYER : ludovic dot denoyer at lip6 dot fr.