This is latest grant that we have submitted to JISC, in this case for a new application of the knowledgeblog platform. As usual, it is a direct post from word, so there may be a few presentational issues in it.

 

The grant is currently under review; I will post the outcome and any feedback (if possible) once I have a result.

Outline Project Description

In this project, we will generate a large body of web content, demonstrating the applicability of commodity blogging technology as supplement to the Universities existing eprints archive. Through a use of technology pioneered by the JISC funded Knowledgeblog project, we will publish 100+ scientific articles, from a variety of different word-processing environments, in a structured-web capable form rather than as PDF. This content will then be augmented to demonstrate the advantages of leverage from a commodity platform, enabling novel mechanisms of publication.

1. Introduction

1The modern publishing industry has been massively affected by the development of the web. However the impact has been highly varied across different domains. Publications that address news events or encyclopedic knowledge have been very heavily affected; other areas have changed little. The web initially developed from the desires of scientists to share knowledge; in some areas, such as biology, the uptake of web technologies has been little short of extraordinary. It is ironic, therefore, that the publishing of formal academic papers has been affected relatively little by the web. Although, content page listings may have been largely replaced by RSS or email, and papers may be available as HTML, they are still largely constrained by the print requirements, packaged as PDFs, poorly linked, with static figures.

 

2An alternative publication mechanism has already been funded by JISC as part of the “Managing Research Data” programme. As part of the Knowledgeblog project, we have investigated using a publication tool, which integrates well with scientists’ existing work-practices, based around a commodity blogging engine, namely WordPress. There are a number of tools such as Open Journal Systems, or organizations like Scielo which allow the web publication of academic articles. While these have large user bases (OJS — 6000 journals, Scielo — 600), currently, WordPress is used to drive around 10% of the world‘s websites; a user base orders of magnitude larger. WordPress, therefore, performs the basic tasks of publishing articles extremely well, scaling to millions of page hits, enjoys tool support from many word processing environments and benefits from many augmentations for specialist audiences. We have extended this tool with a few specialised extensions of our own and, as a result, made it more suitable for academic publishing. We have then used this tool as the basis for two journals, in this case, aimed at producing educational resources describing ontology technology (http://ontogenesis.knowledgeblog.org), and the JISC-funded Taverna workflow system (http://taverna.knowledgeblog.org).

 

3These two resources are, in effect, “gold open-access” — although not requiring author payment. They present content which has not been presented elsewhere, but was written for the purpose; articles have been (or are progressing through) a formal review process. While this has provided a useful resource, generating over 15k page views, these resources are designed to be coherent in scope; although this is generally a positive virtue, by definition it allows us to investigate the suitability of the tooling for only a small number of articles and a limited domain.

 

4Newcastle University has a strong history in supporting gold open access publication: it was the site for the first open access law journal in the UK (http://webjcli.ncl.ac.uk/). In addition, it also has a large and successful eprints repository (http://eprints.ncl.ac.uk) archive, currently hosting 50k articles or bibliographic records; in this project, we will exploit the eprints archive to provide content, building a substantial knowledge resource; this will both demonstrate the suitability of the Knowledgeblog tool-chain as the basis for green open access publication, the value of this novel form of publication, and provide the vital testing against content “from the wild”, allowing us to extend the suitability of this tool-chain to as many areas of academic discourse as possible.

2. Fit to call

5The project call notes that JISC is or has funded many projects relating to scholarly communication. These include: infrastructural support in the form of institutional repositories; support for open-access; and support for novel mechanisms of publication such as overlay journals. Specifically, theme D – campus-based publishing – is aimed at increasing the capacity of the sector to publish and disseminate research outputs directly. The call also highlights attempts such as the “Beyond the PDF” workshop to move toward more structured forms of knowledge; while, in theory, PDF is capable of supporting relatively rich structuring, in practice, most of the tools which generate files in this format produce a relatively opaque, binary artefact from which it is difficult to extract information, or to repurpose or recast that in any way.

 

6While open-access publishing has made significant strides in the last 10 years, becoming an accepted part of the academic landscape, Gold open-access – the publication of original content – still accounts for the minority of academic publications. Green open-access – author publication of content often published elsewhere – now accounts for up-to 25% of the literature in some fields.

 

7Institutional repositories such as that run by Newcastle (http://eprints.ncl.ac.uk) or author archiving on their website (e.g. http://homepages.cs.ncl.ac.uk/phillip.lord/publications.html) are the most common route for green open-access publication. While increasing access to academic materials is a very positive step, this form of publication is largely limited to providing access to a PDF. From neither the authors, nor the readers point of view, is there significant added value to the publication. For example, our experience is that authors are often equivocal or disinterested in publication in institutional repositories as it is “just-one-more-thing” to do, while maintaining a website requires significant technical expertise.

 

8For this grant, academics at Newcastle supported by the infrastructure provided by the local librarians will provide an alternative; we will identify authors within Newcastle, take their open-access publications and recast them into a form suitable for WordPress. We will do this with their active permission and engagement, using the tooling we have developed or documented as as part of the previously-funded JISC “knowledgeblog” project. Where authors wish to, we will support them in performing this work for themselves; where they do not want “just-one-more-thing”, we will leverage off the existing eprints process, and perform this work for them. In general, this can be performed directly using MS Word, latex or other word-processing software, whichever is the authors’ preferred editing environment. In addition, we will use this process to increase the usability of the tooling, increasing the ability to and likelihood that authors will directly publish their work in fashion. As this proposal is built on existing work from the University eprints archive, library-support is implicit within FEC and not specifically or additionally costed.

 

9Once publications are available in this framework, authors and readers will be able to take advantage of the additional features which come either from WordPress directly, or from augmentations provided or assessed by the WebPrints team. For example, authors will be able to see rich content-access statistics, including page-views, referrer and incoming link information. Published articles will be bi-directional linkable using trackbacks. Authors will be able to add tags, zoomable equations or automatically generated reference lists depending on their level of technical competence. For viewers, category and tag based RSS feeds will be available, searching, bi-directional linking (again!) will be possible. As a result of the work from the previous knowledgeblog grant, all posts will be tagged with metadata, in various forms, and will be available for formal archiving outside of the University.

 

10The publication framework is based around WordPress which is freely available, scalable, stable and hardened by its multiple user base. The system is continually updated, but has a good reputation for maintaining backward compatibility. The authoring framework is based around commodity tools such as Word or latex. Most of the workflow process within Newcastle is pre-existing as part of the eprints service. This project therefore provides a sustainable and novel enhancement to the existing process.

3. Workplan

3.1 WP1 Management, Systems Administration and Set up.

11This work package will fulfil the basic management and administrative tasks required for the project. This will include setup of the repository, styling and theming appropriately for the project; definition of a basic workflow for management of documents and metadata; fulfilment of standard JISC reporting requirements.

12We request additional funding of 1k as part of this work-package for virtual server upgrades (additional disk space), dropbox space to enable document management, and wordpress anti-comment spam support.

3.2 WP2 User documentation.

13Most of the operational, “how-to” documentation is already available: either at http://process.knowledgeblog.org (developed by the JISC funded knowledgeblog project); or, as the repository is based on commodity technology, from many publicly available websites.

 

14However, there will be information specific to the Webprints archive; about copyright, about document management, and about the relationship to the university. For this, we will need to generate some specific documentation.

 

15As the project progresses, we will improve and enhance this documentation, based on our experiences, including for example, statistics on how long author self-deposition takes.

3.3 WP3 Author advertising and Material identification

16We will seek active engagement with our user community, by linking into the current eprints system. Combined with the Newcastle-specific, internal “myimpact” database (which was designed to capture research outputs for the next REF), this will enable us to identify new publications as they come out. In the first instance, we will select material that has been published in open access journals (or where embargo periods, or other conditions allow). We will contact authors individually, inform them of our project, and advising them about the methods for recasting of their paper (see WP4).

 

17We will not preselect on the basis of academic quality, only technical and legal (copyright) grounds. Although the eprints service displays full text as PDF only, the myimpact database in many cases also stores MS Word (or equivalent) formatted data. We will, therefore, prefer papers where this data is available. We will prefer papers which are recent over those which are older. Finally, we will prefer papers which give us a wide spread of authorship and discipline.

 

18Although the focus of this proposal is on the provision of a service for publication of green open access material in a fully web-capable format, we will be happy to receive grey literature, on an author-publication basis.

3.4 WP4 Paper recasting

19This work package will take papers selected as part of WP3 and publish them to the webprints archive. In most cases, this work will be performed using tooling developed or documented by the previously funded JISC knowledgeblog project.

 

20We will publish articles in three ways:

Webprints team published. All work will be performed by members of the Webprints team. For each paper, we will write a short report, describing any issues with the publication process, and any errors seen (which we will hand-correct). We will gather statistics on the time taken to publish. Papers will be published on an “as-is” basis; that is we will not seek to enhance the content at this point. We will add metadata in a structured way, which will be accessible from the web presented version.

Author published, webprints supported. We will work directly with authors to publish papers and help them. Where possible, we will augment and add new features (latex maths support, citation). These papers will be marked as featured, and augmented. Again, we will gather statistics on the time taken to publish, broken down for additional functionality.

Author published. Authors will publish directly into Webprints, using either their pre-existing experience, or our own user documentation. We will request, but not require statistical feedback. Publication will be as the author wishes — as-is, or augmented with additional functionality.

 

21All papers will be annotated with standard metadata in a structured form; our previous work means that this metadata will be available from the web presentation of the paper.

3.5 WP5 Repository and process enhancement

22For this package, we will focus on two key aspects: tooling for publishing papers and their presentation once there.

 

23For the presentational issues, in the first instance we will focus on enhancements which do not require support from the article material. For example, as we will add metadata to articles, which will allow us to generate metadata headers (CoINS, standard meta tags etc) without further analysis of the article material itself. Likewise, our experience with the knowledgeblog project means that we can support “out-of-the-box”: multiple export formats (including HTML, PDF and ePUB); site wide indexes (by year, author, subject etc); comments; trackbacks and page feeds (including from subsections). Through use of third-party software, we will also be able to add: related papers through textual analysis; tag clouds; twitter backs; automated multi-lingual presentation and social networking support.

 

24We will also investigate enhancements which require modification of the original content (and therefore increased interaction with authors). From the knowledgeblog project these will include: scalable equation presentation; and client-side generated bibliographies. We will also add “custom posts” for supplementary material (spreadsheets for instance). And, finally, through the use of third-party material, enhancements such as syntax highlighting, zoomable maps, slideshows and so forth. This part of the proposal is designed to be open-ended and exploratory; which forms of enhancements, we pursue will depend on the types papers selected and interactions with the authors. There are currently over 13,000 plugins available for wordpress, which provides us with a considerable resource to build from.

3.6. Timetable

Name

Begin date

End date

Resources

WP1.1 – Setup Repository

02/05/11

14/05/11

SC, AL, DS

WP1.2 – Document Workflow

02/05/11

14/05/11

PL

WP2.1 – User Documentation

09/05/11

24/05/11

DS, PL

WP2.2 – User Statistics

16/05/11

31/08/11

SC, AL

WP3.1 – Author Engagement

16/05/11

31/08/11

SC, AL, DS, PL

WP4.1 – Paper Recasting

01/06/11

30/09/11

SC, AL, DS, PL

WP5.1 – Repository Enhancement

01/07/11

30/09/11

SC, AL, DS, PL

4. Deliverables

25A repository of open-access articles in a fully web-capable format. This will act as a supplement to the existing eprints archive at Newcastle. We expect to generate around 100 articles in this form, although this is likely to be an underestimate. We are currently estimating throughput from our experiences with Knowledgeblog, which involved relatively few articles. The process should benefit from high-throughput experience. Further documentation, published on http://process.knowledgeblog.org, describing the process that we have used to set up this repository. Enhancements to tooling, enabling others to publish more easily in this manner. Additional experience and software enhancing the presentation of data held in this form.

5. Project management arrangements

26The project will be managed by Dr Lord, who will be responsible for:

  • Developing Project Management Plans;
  • Ensuring that the Project work package objectives are met;
  • Prioritising and reconciling conflicting opportunities;
  • Reporting and collaborating with JISC programme manager
  • Dissemination of research results.

 

27Project progress will be evaluated through scheduled, short, “stand-up” meetings on a weekly basis, conducted face-to-face, via Skype or phone as appropriate. Primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. We will use other readily available tooling to manage the document process pipeline – Google spreadsheets, dropbox, and likewise for software development (Google code). All staff are associated with other projects or service provision (research, teaching, training); they will be individually responsible for managing these workloads, and are highly experienced at doing so.

5.1 Risk Management

28Staff risks – the basic organisation of the project has been designed to mitigate against staffing issues. All staff are in post and are highly experienced, with long-track records at Newcastle. Costs have been split three ways, therefore even if in the unlikely event that one member of the team leaves during the project, it will not cause significant distruption.

29Software risks – we are using commodity technology, which is very well proven and supported. None of the software is critical (even our basic blogging engine, wordpress, is replaceable). Therefore, while changes in third-party software might degrade or slow progress, it will not halt.

30Engagement Risks – the project requires a level of engagement from Newcastle researchers, which may not materialize. We have minimized this risk by minimizing the effort the engagement takes on behalf of the researchers. The project members are well known to many in the university (DS and SC comprise the “Bioinformatics Support Unit” and have worked for many PIs personally). We have active engagement from the library, in particular from Moira Bent (Science Faculty Liaison Librarian), and Paula Fitzpatrick (Digital Libraries).

5.2 IPR position

31The bulk of the content handled by this work will come from authors within the University. The current restrictive copyright requirements of many publishers place uncertain limits on what can or cannot be done with this content. For this reason, we will use articles that have been published with or have become available under creative commons or other open access license.

 

32Project members will release written work (documentation etc) under a Creative Commons Attribution ShareAlike 3.0 Unported License (CC BY-SA), which allows re-use and modification for non-commercial purposes with attribution. This is in line with the JISC Model Licence. Software linked to WordPress will be released under GPL, as required by the WordPress license. Software which is separable will be released under LGPL. Software linked to other third-party libraries may use other license if required; this will be limited to Free/Open source licences.

 

5.3 Sustainability

33This project is largely based around innovative, novel and leading use of existing software. As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models.

 

34The WebPrints archive will be run from the same server as knowledgeblog.org; this is being developed and maintained and will be for the foreseeable future, and the additional of the WebPrints archive will not be a substantial additional cost. However, should this cease to happen, the content of the WebPrints archive will be creative commons or an equivalent permissive license. This will make it possible for the JISC funded UK Web Archive to store the website for the future.

 

35Although, we will not be able to sustain publication by the WebPrints team past the lifetime of this proposal without further funding, author publication will be possible; our experience with existing tooling is that this is possible for many, although requires some level of technical skill, depending on the word-processor package, and level of complexity of the paper.

5.4 Staff Recruitment

36All staff are already in post. Recruitment during the project will therefore be unnecessary.

5.5 Key Beneficiairies

37Our immediate beneficiaries Newcastle University staff, who will have their work published using a new and novel publication technique. Critically, we will demonstrate the value of this form of publication technique to both researchers and librarians within the University who will in future be better placed to use or support this technology to publish their own or others work in future.

 

38Although presented here as a discrete project, the work fits within the background of the wider blogging community. So, our own knowledgeblog project and website will be able to take advantage of software improvements that will happen as a result of this work. Additionally, the general academic blogging community will gain a new resource. Increasingly, this community is a critical path for public engagement in the academic process.

5.6 Community Engagement

39Community engagement will take place initially by direct contact; we will email authors to ask for their engagement in the publishing process. This should have the secondary effect of advertising the presence of our project. We have active engagement from the library staff, who are well known within the University. In terms of engagement with the resource outside of Newcastle, we will make active use of various web and social networking facilities. Our experience has shown that this can generate significant amounts of engagement in a relatively short period of time. Finally, we will advertise the work through standard academic channels of conference and journal publication; although effective, this tends to be slow. This is problematic for a short project, hence we consider this to be a secondary means of communication.

 

6. Budget

 

Removed for privacy reasons.

7. Project Team

 

40Dr. Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He is a active participant in the Scientific Blogging community, developed the initial idea for knowledgeblogs. As well as managing the knowledgeblog project, he is the developer of tools such as “Latextowordpress”, as well as WordPress plugins such as “Mathjax-latex” and “Kcite” all of which improve the usefulness of wordpress for academic communication.

 

41Dr. Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001. Subsequent positions included working for Bart’s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities. Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data. His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio- Linux) as well as acting in a research capacity analysing high-throughput data. He is currently active within the knowledgeblog project, having been responsible for adding software support for a review process, gravatars, syntax highlighting, PDF and ePUB exports.

 

42Dr. Simon Cockell has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). He is currently active within the knowledgeblog project, having been responsible for metadata support (including Coins), navigational support (for both humans and robots) and is a co-author of kcite and mathjax-latex.

 

43Allyson Lister worked for 6 years at the EBI in Cambridge, developing and producing the UniProt/TrEMBL protein database. She is currently focusing on the use of ontologies for the semantic integration of systems biology data with her current job at CISBAN in Newcastle University. Both at the EBI and at Newcastle University, she developed structured data formats including UniProt/TrEMBL and SBML. She has also been an early adopter of blog technology as a mechanism for communication of both her own and others primary research. Since 2006, she has co-authored a number of posts with other bloggers in the community and has been invited to be a guest author at both the ISCB news and the BioSharing blog. She has published papers highlighting the importance of social networking and live blogging to bioinformatics.

Leave a Reply