A couple of weeks ago, the AIMS team organized a really interested webinar in the Webinars@AIMS series: it was titled Maximizing the impact of institutional knowledge using DSpace and it was delivered by Alan Orth, Linux systems administrator at the International Livestock Research Institute (ILRI), one of the fifteen (15) CGIAR Centers. However, Alan’s role is not limited to technically support the research computing platform; he also manages CGSpace, a customized DSpace instance that acts as the institutional repository not only for the research publications of ILRI but also for many other CGIAR research institutions and their partners.
Alan’s presentation on the use of DSpace for managing and sharing institutional knowledge drew our attention so we kindly invited him to share some more information on the use of DSpace for institutional purposes, the CGSpace and knowledge sharing in general – let’s see what Alan has responded to our questions!
Could you share some information about you? What is your educational background and professional experience?
I have a Bachelor of Science in Computer Information Systems from the University of California, Chico. Professionally I have always worked as a systems person — networks, security, administration of Linux servers, etc. I spent a few years teaching computer science in a rural area of Kenya, which was a great experience, if only because explaining things to other people is a fun way to find out if you really understand them in the first place! I’ve been at ILRI in Nairobi, Kenya since 2009, where I’ve primarily been working to build and support a high-performance computing infrastructure for storage and analysis of genomic data.
How does DSpace compare to the competition (e.g. EPrints, Fedora, Drupal)? What are strong points/unique features of DSpace and what would a user miss from the other competitive platforms?
My first and only foray into institutional repositories was DSpace, so I actually don’t have any experience with EPrints or Fedora Commons! Drupal is a great content management system (CMS) and you could probably turn it into an institutional repository but I don’t think that’s a good idea. DSpace doesn’t have to pretend to be an institutional repository; out of the box it comes with support for Dublin Core metadata, users and groups, submission workflows, embargos, full-text indexing, etc that you’d expect from a digital repository. A great example of how to use Drupal in the institutional repository sphere is to harvest content from DSpace using APIs like REST and OAI-PMH. Another point for DSpace is that it has a massive install base, active community, and the developers have set an amazing cadence for new features, bug fixes, etc.
How easy is the customization of a typical DSpace installation in order to meet the needs of a specific community?
Customizing DSpace isn’t for the faint of heart. Having said that, adding things like custom controlled vocabularies is very easy, whereas it gets more tricky when you want to add institutional branding to the user interface or change core functionality. A bit of advice: never hack the DSpace core! Always work with overriding modules if you can, because it will make upgrades to new major versions (like 5.x -> 6.x) more painless. Also, with regards to support staff for DSpace installations, I think it’s better to have Linux systems people than web developers, as DSpace does need a bit of babysitting (Java memory issues, Tomcat setup, offsite backups, source code version control, etc).
We know that CGSpace is an adapted DSpace installation that serves the CGIAR researchers, and ILRI played a key role in this. What are the main customizations that have taken place in order for it to meet the requirements of agricultural research community?
It has to be said that CGSpace is the unofficial de-facto institutional repository for the CGIAR! It grew organically, starting at ILRI in 2009 and gradually working its way into other CGIAR centers. Most of CGSpace’s customizations have to do with catering to new controlled vocabularies, as each research center/program/project usually has their own set of subject terms. We also make extensive use of XMLUI customizations, so that each community has its own distinct look and feel consistent with corporate branding. All of our code is on a public GitHub repository, so if someone likes what we’ve done they can probably see how we did it by investigating the source code.
What is your experience on data sharing among all the ILRI branches and other research institutes in East, Southern and West Africa, South Asia and East and SouthEast Asia?
Data sharing is tricky! We currently have ~15,000 ILRI items in CGSpace, so it would be tempting to think we’ve somehow convinced all of our scientists of the importance of institutional archiving… but alas, we have not. The majority of ILRI content in CGSpace comes from projects and programs in Kenya and Ethiopia where ILRI has its headquarters and principal campus. These two campuses represent nearly 1,000 staff, so its unsurprising that we are able to capture more outputs from those locations.
I think higher-profile outputs like papers are always captured, but without the influence of knowledge management people in the other locations capturing other outputs like presentations, brochures, fact sheets, etc isn’t as successful. I think that will all be changing soon, though, as Open Access and Open Data are becoming critical donor requirements in new projects; capturing outputs will be increasingly linked to performance evaluations.
What are the interoperability aspects of CGSpace with the rest tools & repositories; e.g. with Dataverse (actual reports & primary data sets)?
DSpace versions 4 and above have a fantastic new feature called the REST API, which allows developers to interrogate the repository’s content programmatically; building integrations with WordPress, Drupal, etc has never been easier (and it’s much cleaner than the OAI method ever was). Open Data is a harder problem to solve than Open Access for publications because of data size and data formats.
Where possible we deposit data in repositories that make sense for the particular type of data, then make a metadata-only accession into DSpace that has a link to the data. This works for things like genomic sequences, where the data is deposited at NCBI, or for documentaries and interviews, where the videos are hosted on YouTube. ILRI has explored Dataverse but it didn’t feel right, as it doesn’t provide much aside from hosting small, text-based data like surveys. For what it’s worth, we’re now building a data portal based on CKAN.
We would like to thank Alan for responding to our invitation as well as for his really interesting responses and we hope that we will have the opportunity to feature him again on the Agro-Know blog soon!