museum –

Open Source Solutions in the Museum and Archive Worlds

Sarah Aldrich — Thu, 21 Apr 2022 02:17:23 +0000

In the Museum and Archives worlds, the Collection or Archive Management System used impacts everything from the day-to-day business processes of your organisation to the long-term sustainability of your records. Given its importance, it can seem overwhelming when looking into your options when planning to establish a system or change from an existing one. However, there are options, and here we have broken down some of our favourite open-source solutions.

Many of our blogs on open source software focus on spatial tools and the hard sciences. What we haven’t delved into recently is the importance of open source software in the GLAM sector (Galleries, Libraries, Archives and Museums). Ten years ago, Piers wrote a blog about open source collection management and it’s time for an update. Today we will focus on the growth of these solutions, our continuing support of them (and of that sector), and the open source projects we’ve been putting in place recently in GLAM organisations.

First, to recap, open source software means that the source code is published freely – anyone can download a copy of the code, use it, and customise it. There is also the benefit of reduced ongoing costs – rather than continuing to pay annual licensing fees, open source software installations require only hosting, and upgrades to the code. In addition to financial sustainability, open source software provides technical sustainability. Because the code is freely available, it means that the community of users can contribute bug fixes and improvements in an ongoing fashion. Realising that many GLAM organisations don’t have specialised IT departments, implementation of the software and upgrades is where Gaia Resources can assist, and we have various levels of support that can be tailored to an institution’s needs.

Open source software also supports the evidential value and provenance of your records – the source code is freely available and can be audited to ensure that your data is not being changed or manipulated by the system. Open source software provides a level of transparency for institutions that need to be able to attest that their collections are untampered with.

When working with a collecting organisation, we usually recommend one of three open source tools; Access to Memory (AtoM), CollectiveAccess, or ArchivesSpace. The solutions we provide can be out of the box – that is, it is installed as “vanilla versions” without customisation – or in some cases, they can be heavily customised such as with our Queensland State Archives implementation of ArchivesSpace. We have particular experience in implementing the Australian Series System for archives in all of these software packages.

Access to Memory is an open source tool that is developed and maintained by Artefactual (one of our partners in the current Digital Preservation project for Queensland State Archives). Artefactual also develops and maintains Archivematica – a tool for Digital Preservation. AtoM, as it is known colloquially, is a great tool for small to medium size archives. It provides not only an easy to use interface for staff, but offers an immediate web presence that allows public access to search the collections. For many of our customers, we have implemented add-ons or plugins to provide customised functionality, ranging from subscriber-only access to digital materials to online ordering.

CollectiveAccess is another popular open source collection management tool. Its flexibility is a key benefit of the software, and it can be implemented to manage collections of museum objects (including all SPECTRUM functions), archival records, and digital materials. We have implemented CollectiveAccess for several clients from across the GLAM sector, from archives to high-end art collections, and even not for profits managing their own historical collections. Whirl-i-gig, out of the United States, maintains the source code for CollectiveAccess, and our own Gaia Resources developers have contributed bug fixes and added new features to the source code as we make improvements for our clients.

ArchivesSpace is the tool that we have implemented at the greatest scale. Queensland State Archives uses a highly customised version of ArchivesSpace to manage their 64 kilometres of government records. While ArchivesSpace does come with a public interface, our work with QSA included a custom-built public interface, ArchivesSearch, and we have also implemented these systems for clients in Tasmania. Lyrasis, again in the United States, maintains ArchivesSpace core code.

With our experience over the last fifteen years, we understand the needs of collecting organisations and can recommend solutions that fit the needs and scale of the client. We stand by our passion for open source software solutions and advocate for them as the most sustainable solution for collecting institutions. Whether your organisation is small and volunteer-run or you have kilometres of records, one of the above systems can be implemented in a way to suit your needs.

Think we can help assess or even customise the best tool for your organisation? Get in touch with us via email or start a conversation with us on one of our social media platforms – Twitter, LinkedIn or Facebook.

Sarah

The post Open Source Solutions in the Museum and Archive Worlds appeared first on Gaia Resources.

Archives now and forever!

Piers Higgs — Wed, 16 Mar 2022 02:19:07 +0000

Last week Meg and I went along to the 2022 opening event for the Western Australian branch of the Australian Society of Archives (ASA), hosted by the National Archives of Australia.

We were asked to present on the work that we do in the archival collections around Australia, so we gave a quick run through of a range of projects that we’re working on around Australia, and talked about some common and different functionality. We then talked through some of the technology and future things that might be of interest to archives we’ve been looking at.

The talk was given in person (with a COVID safe setup) and also on Zoom – and the Zoom call was recorded as has been made available below.

https://vimeo.com/ausarchivists/review/686909468/047a43727d

It was nice to see a few of our clients and some new faces as well. One thing that I really like about the way the ASA operates is that it welcomes in new organisations that are seeking to learn about archiving, and some of the people we met that night asked a whole bunch of interesting questions in the session and afterwards around the technology side of archiving.

Over the years, the ASA has facilitated ways for us to learn a lot as a team about archives, and it was good to be giving something back to them and their members! If you have questions about archives, then feel free to get in touch and ask either Meg or myself questions via our enquiries@gaiaresources.com.au email address – or start a conversation with us on our social media via Facebook, Twitter or LinkedIn.

Piers

The post Archives now and forever! appeared first on Gaia Resources.

Small Museums Conference 2020

Sarah Aldrich — Wed, 21 Oct 2020 02:02:16 +0000

This past Friday and Saturday, I was thrilled to attend the Small Museums Conference, hosted at the Historic Ormiston House. Even more exciting was to have one of Gaia Resources’ projects, Q-Album, presented at the conference by the Queensland State Archives.

Q-Album was developed by Gaia Resources in conjunction with Queensland State Archives to provide a platform where small and medium-sized organisations can share the “gems” in their collections.

Screenshot of the Q-Album home page

It provides contributors with a web presence, the ability to curate their content and to engage with added-value functions: then-and-now photos using Google Street View, geo-tagging with Google Maps, timeline filters, and News of the Day – an integration with Trove.

The project went live at the start of this year and currently has six organisations contributing content. Q-Album is free for contributors and everyone involved in the project hopes to see this number grow over the coming months.

You can explore Q-Album for yourself here: https://qalbum.archives.qld.gov.au/.

There were other engaging talks at the conference which also discussed the use of technology in the heritage sector. We discussed podcasting, the pros and cons of particular collection management software, and the discrepancy between tourism and heritage tourism. Further topics recognised the importance of volunteers, the challenges of fundraising, and sharing difficult-to-tell stories. Common to all was how museums, large and small are using technology, innovating tools, and looking ahead to the future. You can find abstracts of the presented papers here.

With a conference theme of Environment – Heritage -Sustainability, the talk ‘At the Intersection – Sustainability, Climate Change & Collection Care’ was particularly poignant. Presented by Amanda Pagliarino, Head of Conservation and Registration from Queensland Gallery of Modern Art, the talk informed attendees about recent studies at the Australian Institute for the Conservation of Cultural Materials (AICCM) and the Environmental Guidelines Project. As a past Archivist/Collections Manager, it was encouraging to hear that museums are reflecting on their carbon footprint, adjusting collection care standards accordingly, and democratising their policies.

Thank you to the Historic Ormiston House and others for making the in-person Small Museums Conference possible in 2020! I look forward to staying in touch and engaged with the heritage and museum sector here in Queensland.

As always, if you’d like to know more about this event or if you have perspectives you would like to share on museums in Queensland, then please drop me a line at sarah.aldrich@gaiaresources.com.au, or connect with us on Twitter, LinkedIn or Facebook..

Sarah

The post Small Museums Conference 2020 appeared first on Gaia Resources.

Managing Tasmania

Piers Higgs — Wed, 24 Jun 2020 01:02:30 +0000

We have been working with the collections community in Tasmania for quite some time around some other projects (as Morgan previously wrote about in the Collecting Tasmania and Aggregating Tasmania blogs).

Lately, we’ve been working through the tendering and procurement process for a new collections project in Tasmania, and since we’ve now received a signed contract we’re able to talk about this one (broadly, at this early stage). We’re implementing new Collection Management Systems for a range of institutions in Tasmania, which will not only assist Libraries Tasmania, Queen Victoria Museum and Art Gallery, University of Tasmania and the Tasmanian Museum and Art Gallery, but will also help make these collections available for much broader use by larger audiences.

In the current circumstances around the COVID-19 pandemic, with closed borders in Australia having significant impacts on travel, projects like this one can not only help to make information available to those who can’t travel, but also be a draw-card for tourism post-pandemic – people wanting to see particular objects, especially those that are highly iconic. I have fond memories of visiting Tasmanian museums when I was a representative on some national workgroups, and seeing the amazing objects that are held in those collections first hand. As a direct result, at the first opportunity I had, my wife and I went back to Tassie for a holiday, touring the collections in Hobart, Launceston and a whole raft of places including the amazing natural locations that Tasmania is also famous for (see below).

Admittedly it’s no collection organisation, but the memories of touring Tasmania remain strong five years later thanks to scenery like this.

This project will have a strong focus on open-source software, which is something of a specialty of ours at Gaia Resources. Open-source software in the broader collections community – also known as the Galleries, Libraries, Archives and Museums (GLAM) community – has been getting a lot of traction. We’ve been implementing and supporting systems based on CollectiveAccess, ArchivesSpace, AccesstoMemory, Archivematica and the first two of these will be involved in the Tasmanian project as key parts of the delivery.

The project is just commencing, and we will be blogging about the approaches that we will take in the project and the work that we’re undertaking in more detail, but it’s great to work with another series of collections to make them available to a wider audience. And just maybe, I can go back and revisit some of those memorable places when this pandemic is over.

Stay tuned for more, or drop us a line via email, or via our social media channels: Twitter, LinkedIn or Facebook.

Piers

The post Managing Tasmania appeared first on Gaia Resources.

Twenty years of WA floristic data

Alex Chapman — Wed, 03 Jun 2020 07:41:23 +0000

I’ve been with Gaia Resources for six years now (!), and grateful to have found a home where my specialist knowledge of taxonomy, systematics and biodiversity informatics adds value to the enterprise.

I remain a research associate at the Western Australian Herbarium, both in order to continue my research into WA’s heaths – the Ericaceae – and to provide some institutional services.

One of these has been to collate and summarise significant changes in flora statistics from year to year, starting with the publication of the Descriptive Catalogue (Paczkowska and Chapman, 2000). This included a simple table of major floristic data, and a comparison with previous stats from the past century (updated here in Figure 1).

Exactly the same data was also used in a major paper comparing species richness and endemism in Mediterranean biomes (Beard, Chapman and Gioia, 2000). All this was made possible by my work with the greatly missed Paul Gioia, with whom I built the first statewide census database for vascular and cryptogamic flora – WACENSUS – begun in 1990. Of course, there were previously printed publications by John Green (1981, 1985) and maintained in supplement subsequently by Nicholas Lander, that provided the initial source data. The WACENSUS database, however, enabled real-time documentation of plant names in play for the State, and an immutable Life Science Identifier (LSID) that could be referred to across information systems.

In that time, we have seen the flora statistics document a steady increase in the number of species, both published and ‘putative’ (i.e. manuscript and phrase-name taxa), grow steadily (Figure 2).

2020 marks a number of milestones for the State’s documentation of our precious and unique flora:

the 50th anniversary of the first edition of Nuytsia – WA’s systematic botany journal, that provides much of the published taxonomic work describing and classifying the States’ flora;
30 years since ‘the Census’ became a functional database underpinning authoritative, accurate and an up-to-date source for plant names in current use (and their synonyms) in WA;
20 years since ‘The Western Australian Flora – A Descriptive Catalogue’ was published, from which the descriptive query capacity of FloraBase was drawn;
20 years since the last major analysis of the uniqueness of our State’s flora (especially with regards to other Mediterranean floras of the world);
17 years since the last major revision of ‘FloraBase — the Western Australian Flora’ was released.

In the intervening years, Paul Gioia worked to manually integrate the Western Australian Museum’s faunal names into the Census as well, in order to maximise that knowledge in his major work – NatureMap (2007 onwards). As a result, WA has a standardised names dataset for much of the biota of the state.

It is very heartening to me to see the development of the next generation of WA’s biodiversity information systems through the work of the newly-funded Biodiversity Information Office – see last week’s post.

Yesterday, I received the 2019-20 flora statistics data, from which I will extract the significant highlights and changes for the past year. This complex report was only automated (after years of testing for veracity against my manually-calculated version) in 2019. Again, this could not have been achieved without the dedicated work of Paul Gioia, Ben Richardson and the invaluable curatorial team at the WA Herbarium. These results will be published in FloraBase in coming weeks.

UPDATE: The 2020 flora statistics are now available.

If you’d like to discuss any of the topics covered in this post, please drop me a line at alex.chapman@gaiaresources.com.au, or connect with us on Twitter, LinkedIn or Facebook.

Alex

The post Twenty years of WA floristic data appeared first on Gaia Resources.

CHIN Collection Management System review: our summary

Morgan Strong — Tue, 02 Apr 2019 12:13:53 +0000

Back in 2017 the Canadian Heritage Information Network (CHIN) embarked on a vendor survey of Collection Management System software capabilities and vendor software packages. In 2018, CHIN received the demonstrations, and in late 2018 all the results were published online:

https://www.canada.ca/en/heritage-information-network/services/collections-management-systems/collections-management-software-vendor-profiles.html

The review was not meant to be an endorsement of particular packages, more an appraisal of what’s on offer so that museums could take a look at what works best for their needs.

Considering Gaia Resources does quite a lot of work in this space, I decided to take a look at this significant review and see what it might mean for our existing and future customers in the museums space.

First up, this review included the following vendors and packages:

Axiell (Adlib)
Gallery Systems (TMS and eMuseum)*
Keepthinking (Qi)
Lucidea (Argus)
Lyrasis (CollectionSpace)
MINISIS Inc. (MINISIS)
PastPerfect (PastPerfect 5.0)
Re:discovery (Proficio)
SKINsoft (S-Museum)
Vernon System (Vernon CMS and eHive)
Whirl-i-Gig (CollectiveAccess)

* Note, only a survey was done for TMS and eMuseum, no product review was performed.

You can read all the detailed reviews and vendors response on the site, but as there is no executive summary (well, none that I could find), I thought I could add something to produce a summary of each vendor evaluation done by CHIN.

From my reading of the review, this was my take home for each of the packages:

Axiell (Adlib)

Area	Description
Review performed	15 Feb 2018
Strengths	· Data entry · Browsing / Searching · Online creation
Weaknesses	· Batch editing · Customisation · Audit trails
Overall Comments	Reviewers appeared to say it’s a solid system, but a bit old in design and system architecture.

Keepthinking (Qi)

Area	Description
Review performed	25 January 2018
Strengths	· Web integration · Media integration · Publishing
Weaknesses	· Reporting · Search · Exhibitions
Overall Comments	Web based system seems highly configurable with good publishing features and online access. Easily navigable, with modules logically available. Some concerns over search and loading data. Most reviewers appeared quite positive about the system.

Lucidea (Argus)

Area	Description
Review performed	26 January 2018
Strengths	· Search · Exhibition · Media management
Weaknesses	· Multi-lingual support · Local terminology lists · Templated records
Overall Comments	Overall positive reviews, about the potential for content and metadata management and configuration. Some concerns on the UI and it being a bit cluttered to navigate through. Most reviewers appeared quite positive about the system.

Lyrasis (CollectionSpace)

Area	Description
Review performed	26 January 2018
Strengths	· User permissions · Cataloguing · Exhibitions
Weaknesses	· Audit trail · Batch editing · Reporting
Overall Comments	This product review was a bit unusual in that the individual scores for components were quite low, but the overall comments were less negative. There were some written concerns about relating new content types and the amount of work needed to implement a package like this, but the comments were overall reasonably complimentary. The numerical scores against categories were quite low though, particularly in key areas like audit trails and reporting.

MINISIS Inc. (MINISIS)

Area	Description
Review performed	31 January 2018
Strengths	· Audit trails · Media management · Multilingual capabilities
Weaknesses	· Web publishing · Batch editing · Browsing
Overall Comments	This had mixed reviews, with several reviewers believing the UI was quite dated and the system focused on developers and power users, but some others enjoyed the flexibility and power tools

PastPerfect (PastPerfect 5.0)

Area	Description
Review performed	2 March 2018
Strengths	· Search · Batch editing · Multilingual capabilities
Weaknesses	· Online data entry · Audit trail · Browsing
Overall Comments	Reviewers saw this as an ideal solution for a smaller institution with limited audit requirement and smaller budget, but probably not as suitable for a larger instuition

Re:discovery (Proficio)

Area	Description
Review performed	23 January 2018
Strengths	· Audit trails · Permissions · Batch editing
Weaknesses	· Multilingual capabilities · Exhibitions · Condition reporting
Overall Comments	The reviewers seem to think it is a “good traditional CMS” was the overall sentiment, is a Windows application with good database and search, but not always intuitive into how it functions.

SKINsoft (S-Museum)

Area	Description
Review performed	21 February 2018
Strengths	· Web publishing · Media management · Reporting
Weaknesses	· Import data · Customise data catalogue pages · Local terminology lists
Overall Comments	Overall contained some of the most positive reviews and scores and was positively viewed by the reviewers.

Vernon System (Vernon CMS)

Area	Description
Review performed	14 February 2018
Strengths	· Audit trails · Import data · Local terminology lists
Weaknesses	· Online data entry · Customisation · Multilingual capabilities
Overall Comments	The reviewers seemed to think it was a powerful system, but the interface and layout was a bit dated and based on Windows. Might be hard for smaller institutions to embrace, with some complexity, but offers a lot of power.

Vernon System (eHive)

Area	Description
Review performed	14 February 2018
Strengths	· Media management · Online data entry · Template records
Weaknesses	· Multilingual capabilities · Customisation · Exhibitions
Overall Comments	Ideal for small museums as it offers a web interface and basic collection management, but more built around presenting collections than managing them.

Whirl-i-Gig (CollectiveAccess)

Area	Description
Review performed	9 February 2018
Strengths	· Online data entry · Media management · Audit trails
Weaknesses	· Exhibitions · Condition reports · Generating reports
Overall Comments	Reviewers saw the system as highly configurable, flexible and open source, main drawback seen in the effort required to set up for a collection.

It is clear a lot of work has gone into this review, and I highly recommend checking it out for yourself. Overall, the reviewers seemed to give S-Museum and CollectiveAccess the most positive reviews, but for smaller institutions something else might be more appropriate.

I can only recommend reading all the reviews for yourself and seeing what is the best fit your institution via the link at the start or contact us via email.

Morgan

The post CHIN Collection Management System review: our summary appeared first on Gaia Resources.

Adding Australian National Species List names to CollectiveAccess

Kehan Harman — Wed, 04 Nov 2015 00:43:54 +0000

Recently during our pilot project helping CSIRO evaluate CollectiveAccess as a candidate for the National Collections we needed to show that we could use an external nomenclator within CollectiveAccess. Luckily CA comes with a generic ‘InformationService‘ attribute (field) type which lets you reference an external web service and index and display content from it within your collection management system. Up until now the services available to reference were:

The Getty Linked Open Data services for their Thesaurus of Geographic Names (TGN), Arts and Architecture Thesaurus (AAT) and Union List of Artist Names (ULAN).
WorldCat – Global library meta-catalogue.
Wikipedia.
uBio – lookup a taxon name from uBio.
CollectiveAccess – lookup a value from a different CA instance.

While uBio has got good global coverage it lacks a lot of the Australian names and unfortunately the uBio service is currently down. There has already been significant work compiling taxonomic names for Australia embodied by:

the Australian Plant Census (APC) and the Australian Plant Name Index (APNI) that it is built on and
the Australian Faunal Directory (AFD).

Under the auspices of the Atlas of Living Australia (ALA) there has been work to integrate the APC and the AFD into a single National Species List. I initially tried integrating with the web service referenced there, but I failed to implement a fast enough autocomplete using those services (and who wants to sit around waiting for an autocomplete?). Then in my travels around the interwebs I discovered that the Australian National Species Lists (NSL) are currently evolving into a fully RESTful framework that has a good API that is well documented. Currently, only vascular plant names are included in this framework but I have it on good authority that fungi, algae, mosses, lichen and the AFD are on the way too (see the roadmap here).

In order to add a new InformationService attribute type to CA I needed to implement the IWLPlugInformationService interface, which proved relatively straightforward. I implemented the following methods in my new class to look up the data from the API:

lookup() – provides the autocomplete
getDataForSearchIndexing() – adds additional data to the search index
getExtendedInformation() – Adds a view of the referenced data – this hooks into the
getExtraInfo() – adds additional data to store serialized with the attribute (field) value

Configuring the NSL (and APNI) attribute type

Configuring a field in CollectiveAccess to reference the NSL. You can choose which additional fields you want to add to the search index and also what format the information is presented in.

Autocomplete for scientific names

The autocomplete for scientific names. The NSL web service is responsive and provides just enough information so this autocomplete doesn’t leave the user frustrated.

APC Format embedded information

Name record viewed in APC format

APNI Format embedded information

Because other opinions are also available:

Name record viewed in APNI format

We have already contacted the developers of this new API to make some suggestions of possible enhancements especially relating to the embedded data format that they currently output.

Doing this development is only one part of the cycle – CollectiveAccess is an Open Source project so now that we have implemented this locally we can give a little back. Thanks to Seth for accepting our pull request adding this new feature which will become available in the 1.6 version of CollectiveAccess.

Now that we’ve seen how easy it is to reference external authoritative data sources within CollectiveAccess watch this space for further development. We’re also looking forward to having more data in the NSL and enhanced functionality in its API.

Kehan

Leave me a comment below, or start a conversation with us on Facebook, Twitter or LinkedIn.

The post Adding Australian National Species List names to CollectiveAccess appeared first on Gaia Resources.

A milestone at the WA Museum!

Piers Higgs — Thu, 16 Oct 2014 11:00:55 +0000

Since our last blog in May, Ben, Kehan and myself have been working with the team at the Western Australian (WA) Museum to finalise finalising the building of the new Collection Management Information System (CMIS), based on the open source CollectiveAccess software.

As a result, we have just reached a milestone in the project, delivering the production instance of the new CMIS!

This is a milestone in the delivery of the new CMIS for the Museum, but it doesn’t mean we now walk away from the project. Instead, we are now working with the team on two new projects, providing technical support and providing rollout support. But what have we been doing since May?

Data, data, data, data, data, data, data. And data.

There has been a lot of ongoing work (mainly by Kehan, supported by Evan and the rest of the database team at the WA Museum) in taking the data from each of the Museum’s current collections data set (as per our May blog), including data from other collection systems, Microsoft Access databases, FileMaker Pro data files, Microsoft Works Databases and Excel spreadsheets and building mappings to bring that data into the staging instance of CollectiveAccess.

We’ve then done some preliminary testing on it, and we’ve been running the mapping scripts to bring the data into the production server we now have running. These scripts take a fair while due to the complexity of the mappings, as well as the number of records being brought in.

Developing, documenting and testing

Development has been, well, an “interesting” experience for all involved.

We have written before about the lack of unit testing in CollectiveAccess, which has been something I know has been frustrating at times to the whole team – but we’ve also been working on that where we can. The development has been a great test of how open source software truly works and the team at Whirl-i-gig have been great in supporting both us at Gaia Resources and the WA Museum here – the time zone difference has played well and truly into our favour, with updates happening while we sleep – although, Seth, we have been wondering when you sleep? In any case, there’s been a lot of co-operation in terms of the development, and the rewards are coming from that.

In terms of documentation, we have been creating an internal wiki (based on the open source Media Wiki) that contains a range of customised “how to” details for each Collection. Obviously there are similarities about how different departments work on particular things, and where possible we’ve written more generic documentation. But the documentation is arrayed by Collection, as well as by subject. Of course, as it’s a wiki, we’ll also be able to keep it up to date and re-use it as we go.

Finally, over the last few months, we’ve also been writing very rudimentary Selenium scripts for testing. Selenium is another open source tool, which has been really useful in keeping an eye on the system, it’s performance and stability along the way. A few times these tests have picked up issues (it takes about half an hour to run through them) that we have been able to identify the root cause for and prepare the system for, prior to the finalisation of the build stage of the project. It’s no substitute for unit tests, but it’s a step in the right direction.

Contributing back

Along the way, the team have been working on functionality that we’ve been committing back into the main CollectiveAccess codebase, all of which is available via GitHub. This has included:

Enhancements to the data importer,
Bug fixes to many components of the system,
Adding the ability to update installation profiles,
The new Relationship Generator plugin,
Enhanced LDAP (ActiveDirectory) integration,
Redirect users to requested page after login,
Improved documentation,
Integration with Travis Continuous Integration,
Adding Australian States to user interface lookups,
Australian translation of the user interface (not into Strine, though), and
Adding Australian date format.

We have been using what we have discovered has been called the ‘Github Flow’ workflow for developing, reviewing and merging changes into the project mainline. This helps with maintaining a stable, always deployable branch with feature branches that get merged into it via GitHub pull requests.

To see some of these in action, Kehan recommends that you investigate the Pull Requests in the CollectiveAccess main repository via this URL:

https://github.com/collectiveaccess/providence/pulls?q=is%3Apr+is%3Aclosed+involves%3Akehh+involves%3Aleftclickben

Additionally, all of the WAM specific changes are available on the WAM fork on GitHub, and these are open source and available for others to adopt. This is available at:

https://github.com/wamuseum/providence

You can also see how much is going on in the CollectiveAccess GitHub repository through interfaces like the graphing interface, which tells a story of how busy everyone is on the codebase (Ben and Kehan are now committers #6 and #8!).

The Contributors for CollectiveAccess on GitHub (click to enlarge)

Kehan and Ben have also released a bunch of development helper scripts available for others to use and modify, which you can find in the CMIS-tools repository, located at:

https://github.com/wamuseum/cmis-tools

We’ve been working closely with the WA Museum team throughout this process as well – as a great example, Danny developed the WA Museum ‘skin’ for CollectiveAccess and our team merged it into the WA Museum fork.

I’ve already mentioned that we’ve developed a great working relationship with the team at Whirl-i-gig (who manage the CollectiveAccess repository), and they’ve been great at being responsive and flexible, and we’ve gained a lot from this process of contributing back to CollectiveAccess in return. True open source collaboration at its best.

This will also benefit the broader community – already there have been a number of institutions asking for us to work with them on implementing CollectiveAccess across Australia, based on the success of the WA Museum implementation to date.

Where to next?

So, we’ve been preparing what we now call the “building blocks” for this next phase of the WA Museum CMIS rollout via the production instance and this week kicked off the first team meeting for the rollout, consisting of the WA Museum’s Digital team, the Database team and a few of us from Gaia Resources (Kehan, Ben, myself and now with Alex brought into the project). We all think the end result is a great outcome.

A sneak preview of the new WA Museum skin for CollectiveAccess

Our focus now shifts to two main things – the technical support team needs to embed the skills and knowledge within the WA Museum to be able to be self-sufficient and manage CollectiveAccess into the future, and the rollout team needs to work with the curators and other staff to make sure that it can perform as they need to, and that they are trained in the system itself.

I wrote back in May that it was really exciting to see it all coming together, but there was no way I was really prepared for how readily and rapidly it all came together in the last few months, thanks to stellar work by the whole team behind the project, from my own team at Gaia Resources as well as the WA Museum team, and of course, the sleepless people at Whirl-i-gig.

Piers

Drop me a comment below, or get in touch via Twitter, Linkedin or Facebook.

The post A milestone at the WA Museum! appeared first on Gaia Resources.

A view from the trenches of Collection Management

Piers Higgs — Wed, 28 May 2014 01:58:46 +0000

When I started Gaia Resources, one of my biggest projects was supporting the Western Australian Museum’s collections databases. This work has sent me to a range of the Museum sites, and especially the unassuming Joobaitch House in Welshpool, the Museum’s Collections Research Centre (CRC).

The CRC in Welshpool on a grey winter’s day

From some of our old blog articles (like this one and this one) you’d know that we’ve been working with the Museum to migrate their Collection Management System from the old Microsoft Access / SQL Server database to an online system, using CollectiveAccess.

This week I’ve been on site in Welshpool again, and my job this week has been to start preparing the documentation and training materials for the Museum in this new system. So Kehan, Ben and I thought it would be a good time to provide an update on the project while we’re all out here on site, in the trenches, so to speak, and to focus on the highlights of the project to date, namely the work Ben and Kehan have done in terms of data mapping, plugins and unit tests, who have written the details on these below (beware: technical content incoming…).

Data Mapping

The largest part of the project so far has involved mapping the Museum’s current collections data set into CollectiveAccess. The current data set is widely varied, included data from other collection systems, Microsoft Access databases, FileMaker Pro data files, Microsoft Works Databases and Excel spreadsheets. We worked with the in-house database team at the Museum, who are processing each of the collections and delivering them into a SQL Server database instance with the same (often flat) structure of the original source data in each case. They also fix certain anomalies in the source data. Although CollectiveAccess does support a range of data sources, this has given us a standard starting point for each of the Museum’s various collections.

Every field in each of these source databases is currently being inspected, and the correct mapping to CollectiveAccess is being determined and entered into a set of mapping spreadsheets. Our installation profile, which describes the structure of our data and the fields associated with the various records, is being updated as well so that the database is prepared for the import process. The source data, installation profile, mapping spreadsheets, and a lot of processing time will be imported into CollectiveAccess – creating the initial state of the production version of CollectiveAccess.

To provide some detail, there were many fields that we determined should be a list field (i.e. the user selects values from a list, possibly also being able to add values to the list). In these cases, we inspected the source data using Google Refine. Sometimes the data was clean, but often it was denormalised in some way, usually as a result of it being a plain-text field rather than a list field in the source system. For example, a field containing the sex determination of a specimen might contain in the source data “M”, “m”, “male”, “M.”, “man”, and so on. The CollectiveAccess import framework allows a simple one-to-one value mapping for cases such as this (“m” => “M”, “male” => “M”, etc).

Part of the Import Mapping for the Zoology Collection -using tools like Google Drive to collaborate

In more complex cases, some algorithm was needed to determine the correct value based on the input, and in some cases there wasn’t a straight one-one relationship between fields. Many source fields have been used to populate a single field within CollectiveAccess, and vice versa. In some more complex cases, such as with biological taxonomy, a hierarchy is defined by a flat set of fields in the source data and have to be processed to suit. Other fields needed their value adjusting, such as truncating a repeated token, stripping control characters or applying regular expressions.

Some other data cleaning was also carried out using routines in Pentaho Data Integration, an ETL suite that lets you create reusable pipelines for manipulating data.

CollectiveAccess has a powerful data import framework (see their wiki for details) that provides tools to deal with these situations. There are many “refineries” provided, to perform common data mapping tasks, and a template engine which allows data to be constructed using values from any number of fields. There is also an API to develop any refinery that you find lacking, or which suits the specific needs of your data migration.

As an example, we developed a refinery that reads a “date” field, in which the values are always fully formed dates, and a “date accuracy” field, which is a code representing a date, month or year level of certainty in the accuracy of the date. CollectiveAccess has a date data type that allows values that aren’t fully-formed, such as “2014/05” or even just “2014”. Our “Date Accuracy Joiner” takes the date and the accuracy and truncates the uncertain part of the date; for example, “2014/05/01” and “month” accuracy would result in “2014/05”.

Plugins

We have some requirements to generate additional data based on user-entered data, to help reduce repetitive data entry and maintain consistency. CollectiveAccess includes a plugin mechanism, which allows custom code to hook into various parts of the workflow, including when data is manipulated, when a page is loaded, and for periodic tasks. We have developed two plugins:

A configurable title generator, which assigns a generated “preferred label” value to records (including objects, collections, events, etc) based on their type, assigned collection(s), and accession details. CollectiveAccess has historically provided several “title generator” plugins, however they were not as configurable as our contribution and were much more customised to specific databases.
A configurable relationship generator, which automatically adds and removes relationships for an object to any other identifiable record (including objects, collections, events, etc). This is highly generic and can be configured to base its decision-making process on any available data related to the record being saved.

An interesting aspect of these two plugins is that the output of the relationship generator is possibly used as input by the title generator. That is, a relationship might be generated that affects the generated title for a given object. There is no mechanism for causing the plugins to fire in a particular sequence, however we found that this was not relevant because, upon adding the relationship to the object, the object is saved again, which causes all the plugins to fire again, and the title is actually updated regardless of the sequence of plugins. However, this form of recursion means that plugins of this type (i.e. those that modify the database) need to be careful to only actually call the `update()` method when there is an actual change, otherwise an infinite loop will result.

We will be developing a much more complex plugin in the coming weeks, so watch this space. We’ll have already contributed some of these back to the CollectiveAccess repositories. We will continue to do this so that they become freely available.

Unit Tests

To confirm the correctness and validity of any code, the current best practice is to write unit tests for that code. Unit tests typically exercise small sections of code (“units”) by abstracting away complex or temporal fixtures like databases and other portions of code, providing known inputs to the code, and testing its output against known expectations.

Enough said, right?

Unit tests should cover the range of possible inputs, including valid and invalid inputs. For example, a unit test for a function that adds lists of numbers together might include cases for positive, negative and zero numbers as inputs, lists of varying length including empty lists, and lists containing non-numeric items (“unhappy path testing”, where the unit is expected to fail with an appropriate exception or error message). Unit tests are often contrasted with integration tests, which test the interaction between components within the software, or the system as a whole (“end-to-end tests”).

CollectiveAccess is a software system with a long history, and only a small number of existing unit tests, mostly for the most fundamental classes and functions only. The unit tests are written using the well-known testing framework, phpunit. There was no existing mechanism for unit testing plugins, and in fact the tight coupling of much of the code with the database means that “true” unit tests were difficult or impossible to write for much of the plugin code.

To solve this problem, we have developed a pattern for writing tests for the plugins that were in some ways unit tests, but were in many other ways more correctly classified as integration tests. These tests expect a working database, and will generate any data that is required for the test, inserting it into the database in the setup phase, and removing it in the cleanup phase. In this way, the test should leave the database in very close to the same state as it was before the test was run. The exception is that sequence numbers (such as those for auto-generated id values) will have progressed, but that is not a concern in most situations. This pattern is implemented as an abstract base class for plugin integration tests. Tests for specific plugins should extend this base class and use the documented pattern to define the `setUp()` and `tearDown()` methods.

Again, we’ll be submitting these back to the CollectiveAccess community down the track.

Rollout!

So now you have a better idea of some of the work being done behind the scenes on this project.

We’re getting towards the crunch time, where we are going to be heading into a production instance soon. Data is starting to flow in via the mappings, there is new functionality appearing from the plugins, and we are starting to see what the system is actually going to be looking like. The project is at that exciting time when it all starts to come together and we see changes happening readily and rapidly.

While I’m documenting the systems and coming up with training material, I’m also testing the system from the end user perspective (and may even go so far as to start doing some automated testing using Selenium). So far I’m really impressed with the work and the documentation is going smoothly, and I’m sure that this is going to lead to a nice customised system for the curators and staff here at the WA Museum.

So for now from the trenches, we’ll sign off. Tally ho, chaps!

Piers, Ben and Kehan

Leave a comment below, or drop us a line on the Gaia Resources twitter account or Facebook page.

The post A view from the trenches of Collection Management appeared first on Gaia Resources.