Legal Information Retrieval on the Web The Experience of the NiR Portal Costantino Ciampi 1 Rome, 26 April 2004 CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Legal Information Retrieval on the Web The Experience of the NiR Portal (http://www.nir.it) Costantino Ciampi e-mail: [email protected] Contents Normeinrete (NIR) – “Access to Law on the Net”: an e-Government project Project description (goals, technology, results) Standardization in the legal domain: XML representation of Italian norms URN adoption to automate hyperlinking among norms in a distributed environment CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica NiR Project "Access to Law on the Net" http://www.ittig.cnr.it Project goals 3 • Improving accessibility to legislation by providing a unique point of access to Italian and EU legal documents published on different web sites – ICT to allow rights fulfillment • Supporting PA in managing legislative documentation life cycle and law consolidation by providing standardization, software tools and methodologies – ICT to improve PA efficiency A system prototype (third version) is available at the Url: http://www.normeinrete.it CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it NiR Actors 4 • Main Actors: – Minister of Justice (beginner) (www.giustizia.it) – AIPA -> CNIPA - (Authority ->) National Center for Information Technology in the Public Administration (founder and technical coordinator) (now CNIPA) (www.cnipa.it) • Scientific and Technical Partners: – Institute of Legal Information CNR, Florence (www.ittig.cnr.it) Theory and Technologies of the – CINECA Consortium, Bologna (www.cineca.it) • Public Administrations participating at the Project CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Steps and Resources of the NiR Project 5 • Phase I (May 1999 - May 2000) First Study of feasibility and realization of the Portal prototype • Phase II (December 2000 - November 2001) Second Study of feasibility, extension of the documentary base and qualitative evolution of the Portal prototype • Phase III (years 2002/2003) Definition of standards (URN and XML) and preparation of the software for the dissemination of the standards (parser of references and parser of structures, NIREditor XML) • Phase IV (years 2004/2005) Commitment to external managers and full operation of the NIR Portal (with economic resources from the e-Government programme and Italian financial laws) CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it NiR Project Strategy 6 • Implementation of a specialized portal, delivering search and retrieval functions of legislative documents published on various Public Administration's web sites; • Definition of standards, consistent with Internet technologies, to represent data and metadata meaningful in the legal domain; • Development and distribution of open source software to support legislative document management and publishing; • Training and knowledge sharing among Public Administrations. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Present Results • www.normeinrete.it: provides unified access to Italian and European Union legislation published on different institutional web sites So far – more than 50 public institutions have taken part in the Project; – more than 140,000 documents have been indexed; – about 160,000 search sessions are held monthly on the site; – creation and updating of the NiR Legal Database ("Norm Catalogue") including metadata; – definition of the NiR Standards. • Two standards issued by AIPA/CNIPA as technical norms – DTDs definition for Italian legislation; – URN definition for any kind of legal document; 7 – Editors and other software tools developed and distributed to PA to support standard implementation. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it NiR Features 8 • The system is based on co-operative technological architecture, resulting in a federation of legislative data bases developed on different platforms. • Co-operation is achieved by means of suitable application gateways which provide "loose" integration by adopting two standards: – one for identifying legal resources (URNs), and – one for representing document structures and metadata by XML mark-up language according to ad hoc DTDs. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Searching Tools and Architecture of the NIR System 9 (1) The NIR System consists of: • NiR nodes: components belonging to administration domains containing legal database systems and related application gateways. Documents can be stored in the file system or within database/full text management systems: they are all accessible through the Internet • Central registries: components in the co-operative layer publishing information, needed to allow effective co-operation CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Searching Tools and Architecture of the NIR System 10 (2) • Central registries include: – Standards repository (XML DTD and URN grammar definitions and tools); – Registry of official Authority names, needed to standardise URN adoption; – Registry of NiR nodes, containing information needed to allow interaction between NiR agents and domain application gateways; – Norm Catalogue, containing, for each norm: title, basic classification, URN and the list of known physical addresses (URL) where it is published. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica The Norm Catalogue http://www.ittig.cnr.it (> 45.000 documents) 11 – The Norm Catalogue is a relational database containing, for each norm: title, basic classification, URN and the list of known physical addresses (URL) where it is published CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it NiR Standards 12 • Uniform Resource Name (URN) definition (based on IETF) to: – identify each document regardless of its physical address (URL) – allow automatic hyperlink through a resolution system (as DNS) • Document Type Definition (DTD) for Italian legislative and regulatory acts (based on W3C XML Meta-language) to represent documents structure, semantics and metadata (*) The standards have been issued as AIPA/CNIPA technical standards and published as regulations in the Italian Official Journal CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica URNs (1/3) http://www.ittig.cnr.it • 13 • Each law contains several references to other laws: the whole legislative corpus can be seen as a net, laws being nodes connected through references; Manual activity is required to build laws hypertext through URLs; • The URN is a persistent, location-independent, resource identification mechanism; • The URNs are defined as a combination of elements, according to a specific grammar, that are basically: name of the enacting Authority, type of norm, date, number and a some more detailed specifications when needed; • URNs can be built regardless the availability of corresponding documents on-line. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica URNs http://www.ittig.cnr.it (2/3) 14 • The adoption of a URN-based scheme allows to build an automated distributed hypertext, according to a model similar to the DNS (Domain Name System) used to resolve the self-explaining web sites' names into numerical HTTP addresses. • This opportunity relies on the following considerations: – the natural language expressions used in law references usually contain repetitive patterns, thus automatically detectable; – the URN is built by combining data (almost) always included in the reference; – cross references between each URN and the list of corresponding URLs, needed for the resolution service, can be built automatically. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica URNs: tools and examples http://www.ittig.cnr.it (3/3) 15 • Parser – Available on-line, automatically detects references within laws. • Resolution service – Resolves URNs into URLs (when known). CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it XML Representation of Italian Legislative and Regulatory Acts 16 (1/5) Three categories Documents with a well-defined structure – laws, constitutional laws, regional laws Documents partially structured – regulation acts, decrees Generic documents – any kind of non-structured acts, enclosures,.. CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica DTD definition approach http://www.ittig.cnr.it (2/5) 17 Three DTDs • Basic DTD: well structured simple documents • Strict DTD: well structured complex documents • Loose DTD: documents with irregular structure, exceptions (suitable for historical documents) Each DTD can represent several document types Mark-up must be carried out using only relevant elements CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica XML Elements (categories) http://www.ittig.cnr.it (3/5) 18 • Structural elements – heading, preamble, sections, articles, paragraphs... • Special elements – references to other laws, formatted representation of textembedded relevant entities (institution, dates, places) • Elements containing Metadata – subject-matter classification, publication data, preparatory iter • Semantic elements – obligation, prohibition, penalties, exceptions, modifications, abrogations,... CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica Examples of Legal Texts in XML http://www.ittig.cnr.it (4/5) 19 • Example of an Italian Act, tagged with DTD Basic • Examples of fragments of legal texts in different formats (XML vs Html) • Navigating the document structure with a visual XML editor CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Training on XML and Development of an XML NirEditor 20 (5/5) Considering the relevance of XML to NIR: • an intense training activity has been carried out, also with the aid of multimedia e-learning product developed by ITTIG-CNR; • an XML Editor, that will be distributed as open source software, has been developed and enriched of parsing functions by ITTIG-CNR . CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica http://www.ittig.cnr.it Opportunities deriving from NIR standards 21 • Advanced search functions • Supporting legislative documents life-cycle (law enacting workflow, "law in force" at any given date) • Moving from a totally “free” approach to a more formally-defined organizational model in order to achieve completeness and to improve precision CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica Conclusive Remarks: http://www.ittig.cnr.it Current Developments and Future Initiatives 22 • Software tools to support Administrations in the adoption of NiR standards • XML Schema definition • Parsing services • New metadata • Implementation of distributed URN resolution • Certification of the authenticity of acts through digital signature technology http://www.ittig.cnr.it CONSIGLIO NAZIONALE DELLE RICERCHE Istituto di Teoria e Tecniche dell’Informazione Giuridica 23 ... The End …