Thursday 4 September 2008

CLA Data Recording And Reporting With Intralibrary v3

April 30, 2008

In early April Intrallect provided the latest test “build” of the Intralibrary repository software. This release contained new functions to enable the recording of metadata required by the Copyright Licensing Agency for any items scanned and added to the repository under their Higher Education Trial Blanket Licence.

Institutions that digitise books and journals under the CLA licence have to record additional information about each item scanned and added to a repository. This includes for what course of study each item is scanned, course codes and titles, lecturer names, alongside the bibliographic information about each item from which the scanned extract was derived. Earlier in our project the CLA reporting requirements were passed to Intrallect software developers for including in a potential “finished product”.

This new release contained an enhanced “application profile” which could be used alongside a “workflow” that will enable the repository user to add “CLA type” metadata. Application profiles are a series of metadata options which can be tailored for particular workflows on Intralibrary. The profile contains “fields” (containing the sort of item descriptions mentioned above) which you can set in a workflow as “mandatory”, “recommended”, “optional” etc. depending on what sort of metadata your user would like to add for any given item.

The default application profile for Intralibrary has new fields added that can be used for adding metadata for “CLA” items, including “source publication type”, “source title”, “journal volume”, “journal issue”, course information (code and title), academic’s name and so on. Controlled “vocabulary” fields were added too, so the CLA reporting “codes” for “document source” and “reason for scanning” could be entered from easy “pull down” menus. As well as providing a suitable level of description for each item, this should make it easier to isolate the metadata in these fields to create an output report of all items added to a “CLA” collection.

A copy was made within Intralibrary of the default application profile, and this was tailored for CLA purposes. To help get things straight in my mind, I cross matched the fields which appear in a typical CLA report with those fields in the new CLA application profile template. I set up these fields within the template as “essential”, so they will appear in the metadata editor for anyone using the CLA “workflow”. The CLA workflow already existed from the previous version of Intralibrary, and the new application profile was “attached” to this.

As a rule, there is no need to use every conceivable metadata option in the application profile (this would mean the user would be confronted with a very long “cataloguing” screen when adding metadata!). I worked through the application profile selecting those fields which could provide the most suitable and easiest “template” for recording items added to the repository scanned under the CLA licence. This was done with some very useful input and advice from Sarah Currier at Intralibrary.

The nature of the application profiles provided with Intralibrary is such that there is a degree of flexibility as to where you could place certain bits of metadata. Self-archiving in a repository is, after all, intended to be more flexible than what I could call “standard librarian type” cataloguing. For instance, author, ISBN (International Standard Book Number) and editor metadata could be placed in two possible places in the application profile. This goes against what some librarians might view as “standard” cataloguing practice, but in these days of self-archiving such flexibility could be a good thing. For one thing, it makes it possible to tailor the metadata requirements for each user and not force them to use a metadata template they would find too cumbersome and time consuming to “fill in”.

A series of “test” PDF documents were created for using in a test “course module” created within a classification “node” of Intralibrary v3. These were uploaded using the existing CLA workflow, which was attached to the new application profile in v3.

Initially there were some little bugs after installing v3. For example, the application profiles were not editable, and the “classification” function didn’t work either, but these were soon fixed thanks to help from Sarah at Intralibrary and Boyd from the Pathfinder team. Some of the system and navigation options from the previous version (“reserve item” and rebuild object cache) also seemed hidden in the new version, especially if you used Internet Explorer 7.

In contrast, using Mozilla meant that these did work! In many respects the “build” being used throughout this process was very much a test version, although it was clear from the outset that v3 was easier to use and navigate than previous.

Using the new CLA customised application profile the metadata editor screen appeared as it should, with the new CLA metadata options appearing. One thing which became clear is that for all these CLA type materials, the course module information (code, title, academic, student numbers etc) has to be retyped into each record for each item. A way to copy this metadata into a record from a “scratch pad” or import it into the record automatically would be useful, but I don’t yet know whether it’s possible (and how to do it anyway if it is).

The existing CLA blanket licence forces you to classify and provide resources “by module”, so all this information has to be entered in some way for every item added to the repository. Using the classification nodes in the Intralibrary repository help to “bundle” documents by course module code for easy retrieval, but this stipulation in the licence (to only provide documents to particular module audiences) is not widely popular (from the opinions regularly voiced in discussion with colleagues in Keele and professional colleagues elsewhere). It also means that “versions” of digital documents have to be created for each separate module, each with a separate URL, should the same extract be requested by more than one academic.

After working with the metadata editor and uploading a few documents some tricky little anomalies appeared. It became apparent that within our CLA workflow the “contributor” field in the metadata was set up to automatically add the name of the user uploading the item. The new plan was originally to put the “source author” of the item scanned in this field. To achieve this, a new “vocabulary” was created for this field in the application profile which allowed for the recording of the appropriate information here (“author”, “editor” etc) followed by the actual name in the following field. This seems to have worked initially.

Where to put the “source ISBN” (of the work digitised under the CLA licence) in the application profile also created a pause for thought. Could it be placed in the profile section intended for “catalogue” or “description” metadata? The ISBN is a very useful piece of metadata which uniquely identifies a book or journal, and it was decided to locate this in the “1.3 Catalogue” section of the application profile.

When digitising items under the CLA licence it’s essential that you make sure the restrictions of the licence are observed, mainly in relation to how much of a particular work is provided electronically to a course in any one year. Being able to cross-reference an ISBN is also useful from the CLA administrator’s view in that you can see whether you’ve scanned an extract before for another course too, saving the effort in doing the document scanning again. Being able to search on ISBN is very important in this respect, so where it goes in the metadata can be something you want to decide on, and stick to, as the repository grows.

If you wanted to adopt a sort of “belt and braces” approach in applying metadata to CLA items, you could add the source title and the extract title to the “description” field in the application profile. In using the earlier version of Intralibrary provided, I adopted a “Harvard” bibliographic reference style for this metadata entry (detailing author name, year of publication, publisher, etc). This field, being searchable, could also provide a way to effectively search the repository, but wouldn’t be useful for reporting purposes as it doesn’t “separate” out the data.

The version of Intralibrary we’ve used for “live” CLA document delivery since September 2007 was used in this way, with all the metadata included in the file title, description field, and classification node. The new application profile, provided with v3, provides more scope for proper recording of this metadata in appropriate fields. It is certainly an advance, but as mentioned above, you need to think about where you place the metadata, and also how you type it in. Consistency in this is up to the user.

Today, Intralibrary have provided a new release of the repository software, so with all the above in mind, as soon as it is installed the next thing to fully test is the reporting function. In the release we’ve had for the past few weeks doing a report seemed straightforward. Within the report area you select the fields in the application profile you want the report to include from a “pull down” menu, and clicking on a button should provide an excel spreadsheet detailing all the items added to the CLA collection with the appropriate metadata from the application profile appended. This can be used as a basis for a “CLA return report”.

Keywords: Application Profile, CLA, Copyright Licensing Agency, Intralibrary, Metadata, Workflows

Posted by Scott McGowan @ Keele Pathfinder Team

No comments: