Digitization and mark-up languages
Main Article Content
Abstract
The "Digital Library" is - above all - a very favourable opportunity to enrich, in different ways, the knowledge and professional skills: librarians should not miss this chance. In fact, the digitization projects ask for project management skills, awareness of new technologies and related costs, interaction with professionals belonging to different fields, and, last but not least, ability to attract adequate funds. Furthemore, the partnership with some dynamic companies belonging to the information technology area can deeply change the working style and practices in the libraries.
On the descriptive ground, the article does not survey some momentous subjects of digital libraries (e.g. copyright and digital resources preservation) because it focuses only on a specific topic: the comparative analysis between technologies, costs and outcomes of capture of digital images and capture (or creation) of digital texts.
When the collection is made up of textual documents rich of informative contents, it should be endeavoured to get an electronic text, because it's a good investment in the long term. At present, satisfactory results are achievable through ICR and fuzzy searching software. But the final aim of a digitization project, for such collections, should be to get a structured text, and the structure should be a logical and permanent one. The article explains the main features of SGML standard and the more recent advances, especially the innovative attributes of XML. This new metalanguage, born in the middle of Web explosion, drag into the Web the SGML ability to transport information. This feature should arise the interest of information professionals and librarians. XML electronic documents can suit very well preservation and access requirements. Furthemore, they hold a cardinal position in the Web of the future: in this forthcoming environment, structured texts, jontly with metadata, will play a key-role in the exploitation of searching functions. HTML and PDF are probably bound to fall into decline in the "semantic Web". This tendencies can be surveyed into the documents of the web standards community, especially the W3C metadata activity.
The author refers about experiences and research in the field of SGML/XML application to digitization projects. In particular, he refers about the Electronic Text Centers activity, underlining the role of library community within an interdisciplinary model.
In some countries like US, UK, and Canada text encoding already is a part of librarians skills. But it is important to remark that this is a natural evolution (and enrichment) of cataloging skills and methodology. What should be required is the accurate knowledge of the markup languages and the ability to write DTDs (Document Type Definitions), not only a generic awareness of the subject. The ability to typify is quite similar to cataloguing skills, and a good typification cannot lack the experience of the information intermediaries. The computer programmers need the interaction with information specialists to capture the logical structure of the text and therefore to give a better design to the whole information system.
XML, together with metadata, represents the ground of an interdisciplinary activity, and a chance for librarians to play a dynamic role into the digital environment.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.