Friday, 22 September 2006

SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments) is 'focused on developing robust, open source tools based on Semantic Web technologies that improve access, management and reuse among digital assets' and looks very cool.

When the Exploring launch is out of the way on Monday I'll go back and see if there's anything we can start to use.

Tuesday, 19 September 2006

What I did on my summer holidays

I was on site at Çatalhöyük for two weeks, and while I was there I contributed to the Catalhoyuk blog.

For me, it was a good opportunity to explain what I do on site - people are often confused about why a database developer would be going out to work on site in Turkey.

After that, I spent a few days in Istanbul, where I went to the Çatalhöyük exhibition and also to Istanbul Modern.

Then I caught a train to Bucharest, where I started my holiday. I visited Romania, Moldova, Transdniestr, Ukraine, and finally flew home from Krakow a month later.

Saturday, 16 September 2006

Clay pipe recording at MoLAS and "Clay tobacco pipe makers' marks from London" website

[Update, December 2011: if you're interested in clay pipes, you may be interested in Locating London's Past. The site also has an article that explains how Museum of London Archaeology (MoLA) Datasets - including clay pipes and glass - have been incorporated into the site.  NB: other than adding these links, I haven't updated the original 2006 paper below, so it doesn't include any enhancements made for this new work.  On a personal note, it's lovely to see that the sites, and the backend work behind them, still have value.]

Wheel symbol with pellets between the spokes, c 1610-40.
I'm just back from giving a paper at the Society for Clay Pipe Research Conference, held at the LAARC today. I thought I'd share the content of my paper online so that other people interested in digitising and publishing collections online could see how one particular project was implemented.


Some interesting feedback from the question session afterwards was that other archaeological units, museums or researchers might be interested in publishing records to the same site. In that case, I'd be happy to review the structures so they could be generalised (for other identifiers, for example) and publish them as an open standard along with more detailed information on the digitisation process.


Anyway, here's the text of the paper:


Clay pipe recording at MoLAS and the stamped makers’ mark website



Mia Ridge, Database Developer, Museum of London



SCPR ANNUAL CONFERENCE, September 16th 2006



London Archaeological Archive and Research Centre, Mortimer Wheeler House



Summary


The paper discusses the process from initial specification through requirements gathering, database design, development of the database application and website, to publication online.


Introduction


The project began with a proposal to create a database of clay tobacco pipe makers' marks from London:

"...a physical and digital database of clay tobacco pipe makers’ marks found in excavated contexts from London, dating to between c 1580 and 1910. This will encompass examples of makers’ marks, both stamped and moulded, on pipes made in London and imported from further afield, both in the UK and on the Continent. ... The digital version of the database will be made available online, as part of the MoLAS website"


The work had two parts - enhancing the MoLAS Oracle database so that it could record more detailed information about the maker's marks; and creating a website to publish the marks and related images and information online.


Requirements gathering


'Requirements gathering' is the process of scoping and defining a project. The first step towards this is to define the internal and external stakeholders; the second is to determine their requirements. Internal stakeholder requirements include modified forms and structure for recording enhanced data and analysis, while external requirements relate to the publication of the data to defined groups of website users. It is important to define the targeted users of your website so that its content, site architecture and functionality can be tailored to them.


The targeted users of the site were largely determined by the subject matter. The main users will be specialists, followed by general adults. Site functionality, considered as search or browse capabilities, was determined as a balance between the purpose of the site, the needs of its visitors and the content and infrastructure we have available.


The database and website also had to be expandable to provide for greater temporal or geographic coverage, including collections throughout the Greater London area. Finally, in order to design data structures that would best meet the needs of the project, I had to consider nature of the material to be recorded.


The first discussions with Jacqui and Tony were about the requirements for the website. We then met to review the existing Oracle data structures and discuss the necessary changes. I asked lots of questions about how makers marks related to clay pipes - where, what kind and how many might appear on a pipe? It was important to understand how they varied, and which properties of position, type and method were significant, as well as to understand the exceptions. As you know, one 'IS' stamp is not necessarily the same as another 'IS' stamp - the trick is to enable to application to understand the difference. In this process, the aim is not to uncover the detail of the subject but to understand how its typologies are constructed.



Once the requirements have been determined, data structures were designed accordingly. These were presented to Jacqui and Tony, and reviewed in response to their feedback. Prototype forms were then designed to allow data entry, and the same process of feedback and modification followed. Significant changes were made after testing and further modifications were made as necessary during the implementation process as the practical implications of the modifications became clear.


One of the challenges of database design is balancing the benefits of recording in a more structured way, which provides for much greater flexibility in analysis, search and publication against a smaller learning curve and greater efficiency in data entry. For example, as different types of information are separated out of free text or general comments into more precise fields, the time required to record each entry increases.



As the data structures were finalised, queries were run to populate the modified structures with existing data, where possible.


Database design and development



The MoLAS Oracle database is used by our archaeologists and specialists to record field, find and environmental data. It has been developed in-house over many years and is one of the largest databases of its kind in the UK. As the database and forms are maintained in-house, we are able to modify it to meet our needs as required for projects or day-to-day business.


In the MoLAS database an individual pipe record must have a unique combination of sitecode, context, accession number and form that is different from any other pipe record. This unique identifier forms the basis of the database application. This combination of identifiers, called a 'primary key', can be used create links from a pipe with a particular mark to possible pipe makers. The sitecode can also be used to link to information about the particular excavation. Should a specialist desire, they can also link to other finds from the same context as well as related excavation and environmental data. The existing table structure was modified to support recording clay pipes and maker's marks in a more semantically structured way, with more detail and additional attributes.




The existing comments field was split into four new fields: general comments, maker comments, publication references, and parallels. I ran a report that listed all the existing comments so they could be manually reviewed and separated out into the relevant content areas.


A new field was added to mark pipe records that were to be published on the web. Other enhancements included new fields such as completeness, mould, manufacturing evidence, fabric, pipe length, links to photographs and illustrations, as well as a new numerical field, 'die' to allow the recording of individual dies known to be have been used by a single maker or workshop. The final new field was one that allows a particular pipe to be marked as containing the best example of a particular mark.


Some of the new fields required the creation of lists of values. These appear as drop-down menus on the data entry forms, and are used to make data entry faster and reduce errors. They are implemented as tables and can be designed so the values can be edited or added to as required.


When creating new fields, it's important to judge the effect on existing data, particularly in a project that can only selectively enhance records. As the project grant covered the enhancement of records for 120 marked clay pipes made between c 1580 and 1680, a small percentage of the entire dataset, many existing records would not have any data for the new fields. If it is not possible to go back and record the relevant information in the new field for each existing record, might that affect the validity of the data set as a whole? Will queries or searches return unexpected results if values aren't recorded consistently? Sometimes it is possible to apply a default value for existing records, or to mark previous records as 'not recorded'.


New tables were created to record information about known pipe makers. This includes their name, address, earliest and latest known dates, and free text including documentary evidence for this information. As this information is recorded in the database rather than in text files, it can be more easily searched and combined with related information and pipes for publication.


Additional tables were created to record the relationship between a mark on a particular pipe and a possible maker, including the probability of any pipe being related to a individual maker plus any publication references.


Content preparation


Enhancing database records



The basic process for enhancing stamped pipe records was:


  1. Add the webcode 'CoLAT' to the pipes that will be included in this project

  2. Add the photo number to the Photo number field

  3. Review and update the pipes entries

  4. Create the makers entries

  5. Add pipes to the sub-form on the makers form to create the link between pipe and definite/probable/possible maker


Two queries were created to help monitor progress and give an idea of how the data would look on the website. One was a report showing which records have been successfully marked up for export to the website. The other showed how the links between makers and pipes would be displayed and could be used to check the success of a link created between a mark and possible maker.


Other content preparation


While the technical database and website development and specialist recording work was underway, Jacqui had organised for the MoLAS photographer to take photos of the marks that were going to appear on the website. The photo number was then recorded in the database. The scripts that generate the website use this to link the right image to the right pipe and mark for display on the web page. Jacqui also wrote text for inclusion on the site and definitions of codes used in the database were created, to make the published records more user-friendly and clear to non-specialists.


The website



The address of the website, 'Clay tobacco pipe makers' marks from London' is http://www.museumoflondon.org.uk/claypipes/
It is held as a 'collections microsite' within the Museum of London website structure.


The front page



The front page is designed to provide direct access to the data while contextualising the content, making the current scope of the project and the goals of the site immediately clear. It also allows us to thank our funders.


The design of the website was based on templates developed for the LAARC site. The front page introduces the navigation and title banner, which remain consistent throughout the site. There are three immediately clear 'calls to action' for the user on the front page: browse maker's marks, browse makers, and search for marks.


Browse maker's marks



http://www.museumoflondon.org.uk/claypipes/pages/marks.asp

In this section of the site, you can view thumbnails of the best example of each mark. The initials or description of the mark are listed with each thumbnail. This means you can search the text of the page for a particular mark, and also aids accessibility and helps search engines index the site, while still being visually appealing.


View mark


From the list of marks, you can access the page for a particular mark. Where appropriate, this page contains a more detailed description of the mark; images of all the pipes with that mark plus the sitecode, excavation context number and bowl form for that pipe. It also displays the dates associated with that form and the die number for each pipe on the page. Each image of a mark is also a link to the particular pipe page.


View pipe


Each pipe page contains the initials or description of its makers mark, a description of the pipe, its burnishing and milling as well as information about the excavation in which the pipe was found. This includes the address, easting and northing of the site. It would be possible to link to the full site record in LAARC, particularly when the LAARC site has been redeveloped. The page also lists any possible or known makers and the certainty of their being the maker of that pipe. The name of each maker is a link to the maker.


View maker


This page displays the name, address, earliest and latest dates plus any additional commentary and publication references for the maker. It also lists each pipe they might have made with the probability of their being the maker.


Browse makers


http://www.museumoflondon.org.uk/claypipes/pages/makers.asp
This page displays a list of all makers on the site. The name of each maker is a link to the full information about that maker, as above.


Search


http://www.museumoflondon.org.uk/claypipes/pages/search.asp
The search for this site is fairly simple, but the functionality can be expanded if necessary.
It searches the description field for a match to the search term.


About the project


http://www.museumoflondon.org.uk/claypipes/pages/about.asp

This section contains excellent information for general visitors and specialists about the project, why clay pipes are important for archaeologists, the clay pipes of London as well as a glossary and references. These pages contextualise the study of clay pipes, enriching the general visitor's experience and providing specific information about the background to tobacco pipe makers’ marks found in excavated contexts from London for specialist users.


From the database to the website


The first step in publishing the enhanced content online is getting data from the internal database to the web server. The web server is the computer that sends out the pages when a visitor clicks on the link.


SQL scripts run on the MoLAS server extract data from the MoLAS Oracle database and other sources, combining them into a form suitable for publication on the website. This data is stored in tables on the web server database. Information about the archaeological sites is drawn from data published through the London Archaeological Archive Resource Centre (LAARC) web site. This data is linked throughthe sitecode on the pipe or mark database record.


The scripts also combine information that has been stored separately into the publication format according to the relationships defined in the database. For example, they may bring together a pipe with possible makers through the pipe marks recorded. Where necessary, the database extraction scripts also extract the code definitions from List of Value tables. These translate a value like the mark position code 'BR' into the more user-friendly 'on the bowl, on right side as smoked'.


The website is generated by another collection of scripts, using a web scripting language called ASP. These scripts can be thought of as templates that contain placeholders for different types of information or images. When a page is requested, the script runs and fills the appropriate information in the appropriate part of the template.


Because these templates dynamically generate the pages, the design and the content are separated. This means the site can be expanded as new records are added to the database. Updated or new content can appear on the site instantly, without waiting for IT resources to be available. It's also easy to update the templates so that new fields can be viewed, new links generated or additional search parameters added. The graphic design (the 'look and feel' of the site) or the site navigation can be updated in a single script and the change is immediately visible across all site pages.


The 'about' section of the website also contains 'static' pages. These pages do not change when the content of the database changes. However, they use the same scripts to generate the design and navigation so can easily be changed as necessary.


One of the requirements for the site was that it was accessible to search engines. As the project did not have a budget for marketing the site, search engines were going to be the main source of website visitors. The site was designed using 'semantic markup' which not only helps search engines understand the structure of the site, but also aids accessibility for people with disabilities.