IR Collections

B. Piwowarski
This is a sourceforge project. The project page on sourceforge, where you can report bugs or ask questions, is http://sourceforge.net/projects/ircollections/.
Table of Contents
1 Introduction >
2 Installation >
3 IR Task XML >
4 Configuration files >
5 The commands >

1 Introduction

This projects aims at grouping utilities to deal with the numerous and heterogenous information retrieval (IR) test collections, and to automate/standardize common operations like downloading topics, assessments or evaluating.
Each IR task has a unique identifier. For example, the ad-hoc track of TREC-1 has the identifier trec.1/adhoc.
To each task is associated an XML description, composed of informations about the document collection, the topics and the assessments. The IR task trec.1/adhoc is associated with
<?xml version="1.0"?>
<task xmlns="http://ircollections.sourceforge.net" id="trec.1/adhoc">
<collection id="trec.1.adhoc" ref="trec.ap8889 trec.doe1 trec.fr8889 trec.wsj8792 trec.ziff12" path=".../cols/trec1.col.files"/>   
  <topics id="trec.1.adhoc" path=".../trec/trec1/adhoc/trec1.topics.51-100" type="trec">     
    <property parts="desc"/>   
  </topics>
  <qrels path=".../trec/trec1/adhoc/trec1.adhoc.trec1.qrels" id="trec.1.adhoc"/> 
</task> 
The format of the XML specification of an IR task is described in Section 3.

2 Installation

The code for the moment can only be accessed through subversion
svn co https://ircollections.svn.sourceforge.net/svnroot/ircollections ircollections  
This should create a directory called ircollections in the current directory.
In order to evaluate runs, the following should be downloaded:

Trec-eval

The TREC evaluation program

IR Task XML

Configuration files

The commands

All the commands can be accessed through the ircollections python script:
ircollections <command> <command arguments...>
The available commands list, get, prepare, and evaluate.

list

List all the available tasks

get ID [ID ...]

Get the XML definition of one or more tasks. This command takes one or more arguments which are the task IDs and outputs the different definitions. If more than one task is specified, the definitions are enclosed in an XML document

prepare ID [ID ...]

Prepare one or more tasks. This command takes a list of task IDs as argument and prepare them by downloading and processing automatically the different files (topics, qrels)

evaluate ID <run path>

Evaluate a run for a given task ID.