Archive for the ‘Grants’ Category

Josh Brown from JISC has given his permission for me to reproduce the feedback from the peer-reivew of my last JISC grant which bounced. A shame, as it would have provided us with an opportunity to test out knowledgeblog on papers from the wild, while also producing an great demonstrator of the advantages of using the web to distribute papers with web technology rather than just dumping a link to a PDF.

With luck, we can rejuvenate this work in another way.

“One bid (Bid no 8: Newcastle University) was flagged by one of the markers as being out of scope, despite receiving good marks and positive comments from the other two markers.

The original terms of the call specifically state that projects must add value to existing peer reviewed journals. Projects seeking solely to create new publications are specifically excluded. (Please review the sections Expected Outputs and Requirements of the call for more detail on these conditions.)

Bid no 8 states:

“we will identify authors within Newcastle, take their open-access publications and recast them into a form suitable for WordPress”

The bid is clearly designed to aggregate content that has been published elsewhere, largely based on content held within Newcastle’s institutional repository. No existing, peer-reviewed scholarly journal is involved in this project.

While the creation of a web-native publishing tool clearly has merit, as identified by the two markers who praised this bid, the funding call is, as stated, intended to add value to existing publications. In the absence of an existing peer-reviewed publication as a partner in this project, the bid is out of scope”

The panel agreed with this analysis, which meant that, despite the fact that the project was viewed unanimously as very strong proposal on its own merits, we were obliged to decline to fund this project. The requirement for direct partnership with an existing peer-reviewed scholarly journal for all projects in this strand was imposed after lengthy discussion, and for a range of reasons, including sustainability, tight time-frames and so on, and it was felt that this should be upheld.

— Josh Brown

This is latest grant that we have submitted to JISC, in this case for a new application of the knowledgeblog platform. As usual, it is a direct post from word, so there may be a few presentational issues in it.

 

The grant is currently under review; I will post the outcome and any feedback (if possible) once I have a result.

Outline Project Description

In this project, we will generate a large body of web content, demonstrating the applicability of commodity blogging technology as supplement to the Universities existing eprints archive. Through a use of technology pioneered by the JISC funded Knowledgeblog project, we will publish 100+ scientific articles, from a variety of different word-processing environments, in a structured-web capable form rather than as PDF. This content will then be augmented to demonstrate the advantages of leverage from a commodity platform, enabling novel mechanisms of publication.

1. Introduction

1The modern publishing industry has been massively affected by the development of the web. However the impact has been highly varied across different domains. Publications that address news events or encyclopedic knowledge have been very heavily affected; other areas have changed little. The web initially developed from the desires of scientists to share knowledge; in some areas, such as biology, the uptake of web technologies has been little short of extraordinary. It is ironic, therefore, that the publishing of formal academic papers has been affected relatively little by the web. Although, content page listings may have been largely replaced by RSS or email, and papers may be available as HTML, they are still largely constrained by the print requirements, packaged as PDFs, poorly linked, with static figures.

 

2An alternative publication mechanism has already been funded by JISC as part of the “Managing Research Data” programme. As part of the Knowledgeblog project, we have investigated using a publication tool, which integrates well with scientists’ existing work-practices, based around a commodity blogging engine, namely WordPress. There are a number of tools such as Open Journal Systems, or organizations like Scielo which allow the web publication of academic articles. While these have large user bases (OJS — 6000 journals, Scielo — 600), currently, WordPress is used to drive around 10% of the world‘s websites; a user base orders of magnitude larger. WordPress, therefore, performs the basic tasks of publishing articles extremely well, scaling to millions of page hits, enjoys tool support from many word processing environments and benefits from many augmentations for specialist audiences. We have extended this tool with a few specialised extensions of our own and, as a result, made it more suitable for academic publishing. We have then used this tool as the basis for two journals, in this case, aimed at producing educational resources describing ontology technology (http://ontogenesis.knowledgeblog.org), and the JISC-funded Taverna workflow system (http://taverna.knowledgeblog.org).

 

3These two resources are, in effect, “gold open-access” — although not requiring author payment. They present content which has not been presented elsewhere, but was written for the purpose; articles have been (or are progressing through) a formal review process. While this has provided a useful resource, generating over 15k page views, these resources are designed to be coherent in scope; although this is generally a positive virtue, by definition it allows us to investigate the suitability of the tooling for only a small number of articles and a limited domain.

 

4Newcastle University has a strong history in supporting gold open access publication: it was the site for the first open access law journal in the UK (http://webjcli.ncl.ac.uk/). In addition, it also has a large and successful eprints repository (http://eprints.ncl.ac.uk) archive, currently hosting 50k articles or bibliographic records; in this project, we will exploit the eprints archive to provide content, building a substantial knowledge resource; this will both demonstrate the suitability of the Knowledgeblog tool-chain as the basis for green open access publication, the value of this novel form of publication, and provide the vital testing against content “from the wild”, allowing us to extend the suitability of this tool-chain to as many areas of academic discourse as possible.

2. Fit to call

5The project call notes that JISC is or has funded many projects relating to scholarly communication. These include: infrastructural support in the form of institutional repositories; support for open-access; and support for novel mechanisms of publication such as overlay journals. Specifically, theme D – campus-based publishing – is aimed at increasing the capacity of the sector to publish and disseminate research outputs directly. The call also highlights attempts such as the “Beyond the PDF” workshop to move toward more structured forms of knowledge; while, in theory, PDF is capable of supporting relatively rich structuring, in practice, most of the tools which generate files in this format produce a relatively opaque, binary artefact from which it is difficult to extract information, or to repurpose or recast that in any way.

 

6While open-access publishing has made significant strides in the last 10 years, becoming an accepted part of the academic landscape, Gold open-access – the publication of original content – still accounts for the minority of academic publications. Green open-access – author publication of content often published elsewhere – now accounts for up-to 25% of the literature in some fields.

 

7Institutional repositories such as that run by Newcastle (http://eprints.ncl.ac.uk) or author archiving on their website (e.g. http://homepages.cs.ncl.ac.uk/phillip.lord/publications.html) are the most common route for green open-access publication. While increasing access to academic materials is a very positive step, this form of publication is largely limited to providing access to a PDF. From neither the authors, nor the readers point of view, is there significant added value to the publication. For example, our experience is that authors are often equivocal or disinterested in publication in institutional repositories as it is “just-one-more-thing” to do, while maintaining a website requires significant technical expertise.

 

8For this grant, academics at Newcastle supported by the infrastructure provided by the local librarians will provide an alternative; we will identify authors within Newcastle, take their open-access publications and recast them into a form suitable for WordPress. We will do this with their active permission and engagement, using the tooling we have developed or documented as as part of the previously-funded JISC “knowledgeblog” project. Where authors wish to, we will support them in performing this work for themselves; where they do not want “just-one-more-thing”, we will leverage off the existing eprints process, and perform this work for them. In general, this can be performed directly using MS Word, latex or other word-processing software, whichever is the authors’ preferred editing environment. In addition, we will use this process to increase the usability of the tooling, increasing the ability to and likelihood that authors will directly publish their work in fashion. As this proposal is built on existing work from the University eprints archive, library-support is implicit within FEC and not specifically or additionally costed.

 

9Once publications are available in this framework, authors and readers will be able to take advantage of the additional features which come either from WordPress directly, or from augmentations provided or assessed by the WebPrints team. For example, authors will be able to see rich content-access statistics, including page-views, referrer and incoming link information. Published articles will be bi-directional linkable using trackbacks. Authors will be able to add tags, zoomable equations or automatically generated reference lists depending on their level of technical competence. For viewers, category and tag based RSS feeds will be available, searching, bi-directional linking (again!) will be possible. As a result of the work from the previous knowledgeblog grant, all posts will be tagged with metadata, in various forms, and will be available for formal archiving outside of the University.

 

10The publication framework is based around WordPress which is freely available, scalable, stable and hardened by its multiple user base. The system is continually updated, but has a good reputation for maintaining backward compatibility. The authoring framework is based around commodity tools such as Word or latex. Most of the workflow process within Newcastle is pre-existing as part of the eprints service. This project therefore provides a sustainable and novel enhancement to the existing process.

3. Workplan

3.1 WP1 Management, Systems Administration and Set up.

11This work package will fulfil the basic management and administrative tasks required for the project. This will include setup of the repository, styling and theming appropriately for the project; definition of a basic workflow for management of documents and metadata; fulfilment of standard JISC reporting requirements.

12We request additional funding of 1k as part of this work-package for virtual server upgrades (additional disk space), dropbox space to enable document management, and wordpress anti-comment spam support.

3.2 WP2 User documentation.

13Most of the operational, “how-to” documentation is already available: either at http://process.knowledgeblog.org (developed by the JISC funded knowledgeblog project); or, as the repository is based on commodity technology, from many publicly available websites.

 

14However, there will be information specific to the Webprints archive; about copyright, about document management, and about the relationship to the university. For this, we will need to generate some specific documentation.

 

15As the project progresses, we will improve and enhance this documentation, based on our experiences, including for example, statistics on how long author self-deposition takes.

3.3 WP3 Author advertising and Material identification

16We will seek active engagement with our user community, by linking into the current eprints system. Combined with the Newcastle-specific, internal “myimpact” database (which was designed to capture research outputs for the next REF), this will enable us to identify new publications as they come out. In the first instance, we will select material that has been published in open access journals (or where embargo periods, or other conditions allow). We will contact authors individually, inform them of our project, and advising them about the methods for recasting of their paper (see WP4).

 

17We will not preselect on the basis of academic quality, only technical and legal (copyright) grounds. Although the eprints service displays full text as PDF only, the myimpact database in many cases also stores MS Word (or equivalent) formatted data. We will, therefore, prefer papers where this data is available. We will prefer papers which are recent over those which are older. Finally, we will prefer papers which give us a wide spread of authorship and discipline.

 

18Although the focus of this proposal is on the provision of a service for publication of green open access material in a fully web-capable format, we will be happy to receive grey literature, on an author-publication basis.

3.4 WP4 Paper recasting

19This work package will take papers selected as part of WP3 and publish them to the webprints archive. In most cases, this work will be performed using tooling developed or documented by the previously funded JISC knowledgeblog project.

 

20We will publish articles in three ways:

Webprints team published. All work will be performed by members of the Webprints team. For each paper, we will write a short report, describing any issues with the publication process, and any errors seen (which we will hand-correct). We will gather statistics on the time taken to publish. Papers will be published on an “as-is” basis; that is we will not seek to enhance the content at this point. We will add metadata in a structured way, which will be accessible from the web presented version.

Author published, webprints supported. We will work directly with authors to publish papers and help them. Where possible, we will augment and add new features (latex maths support, citation). These papers will be marked as featured, and augmented. Again, we will gather statistics on the time taken to publish, broken down for additional functionality.

Author published. Authors will publish directly into Webprints, using either their pre-existing experience, or our own user documentation. We will request, but not require statistical feedback. Publication will be as the author wishes — as-is, or augmented with additional functionality.

 

21All papers will be annotated with standard metadata in a structured form; our previous work means that this metadata will be available from the web presentation of the paper.

3.5 WP5 Repository and process enhancement

22For this package, we will focus on two key aspects: tooling for publishing papers and their presentation once there.

 

23For the presentational issues, in the first instance we will focus on enhancements which do not require support from the article material. For example, as we will add metadata to articles, which will allow us to generate metadata headers (CoINS, standard meta tags etc) without further analysis of the article material itself. Likewise, our experience with the knowledgeblog project means that we can support “out-of-the-box”: multiple export formats (including HTML, PDF and ePUB); site wide indexes (by year, author, subject etc); comments; trackbacks and page feeds (including from subsections). Through use of third-party software, we will also be able to add: related papers through textual analysis; tag clouds; twitter backs; automated multi-lingual presentation and social networking support.

 

24We will also investigate enhancements which require modification of the original content (and therefore increased interaction with authors). From the knowledgeblog project these will include: scalable equation presentation; and client-side generated bibliographies. We will also add “custom posts” for supplementary material (spreadsheets for instance). And, finally, through the use of third-party material, enhancements such as syntax highlighting, zoomable maps, slideshows and so forth. This part of the proposal is designed to be open-ended and exploratory; which forms of enhancements, we pursue will depend on the types papers selected and interactions with the authors. There are currently over 13,000 plugins available for wordpress, which provides us with a considerable resource to build from.

3.6. Timetable

Name

Begin date

End date

Resources

WP1.1 – Setup Repository

02/05/11

14/05/11

SC, AL, DS

WP1.2 – Document Workflow

02/05/11

14/05/11

PL

WP2.1 – User Documentation

09/05/11

24/05/11

DS, PL

WP2.2 – User Statistics

16/05/11

31/08/11

SC, AL

WP3.1 – Author Engagement

16/05/11

31/08/11

SC, AL, DS, PL

WP4.1 – Paper Recasting

01/06/11

30/09/11

SC, AL, DS, PL

WP5.1 – Repository Enhancement

01/07/11

30/09/11

SC, AL, DS, PL

4. Deliverables

25A repository of open-access articles in a fully web-capable format. This will act as a supplement to the existing eprints archive at Newcastle. We expect to generate around 100 articles in this form, although this is likely to be an underestimate. We are currently estimating throughput from our experiences with Knowledgeblog, which involved relatively few articles. The process should benefit from high-throughput experience. Further documentation, published on http://process.knowledgeblog.org, describing the process that we have used to set up this repository. Enhancements to tooling, enabling others to publish more easily in this manner. Additional experience and software enhancing the presentation of data held in this form.

5. Project management arrangements

26The project will be managed by Dr Lord, who will be responsible for:

  • Developing Project Management Plans;
  • Ensuring that the Project work package objectives are met;
  • Prioritising and reconciling conflicting opportunities;
  • Reporting and collaborating with JISC programme manager
  • Dissemination of research results.

 

27Project progress will be evaluated through scheduled, short, “stand-up” meetings on a weekly basis, conducted face-to-face, via Skype or phone as appropriate. Primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. We will use other readily available tooling to manage the document process pipeline – Google spreadsheets, dropbox, and likewise for software development (Google code). All staff are associated with other projects or service provision (research, teaching, training); they will be individually responsible for managing these workloads, and are highly experienced at doing so.

5.1 Risk Management

28Staff risks – the basic organisation of the project has been designed to mitigate against staffing issues. All staff are in post and are highly experienced, with long-track records at Newcastle. Costs have been split three ways, therefore even if in the unlikely event that one member of the team leaves during the project, it will not cause significant distruption.

29Software risks – we are using commodity technology, which is very well proven and supported. None of the software is critical (even our basic blogging engine, wordpress, is replaceable). Therefore, while changes in third-party software might degrade or slow progress, it will not halt.

30Engagement Risks – the project requires a level of engagement from Newcastle researchers, which may not materialize. We have minimized this risk by minimizing the effort the engagement takes on behalf of the researchers. The project members are well known to many in the university (DS and SC comprise the “Bioinformatics Support Unit” and have worked for many PIs personally). We have active engagement from the library, in particular from Moira Bent (Science Faculty Liaison Librarian), and Paula Fitzpatrick (Digital Libraries).

5.2 IPR position

31The bulk of the content handled by this work will come from authors within the University. The current restrictive copyright requirements of many publishers place uncertain limits on what can or cannot be done with this content. For this reason, we will use articles that have been published with or have become available under creative commons or other open access license.

 

32Project members will release written work (documentation etc) under a Creative Commons Attribution ShareAlike 3.0 Unported License (CC BY-SA), which allows re-use and modification for non-commercial purposes with attribution. This is in line with the JISC Model Licence. Software linked to WordPress will be released under GPL, as required by the WordPress license. Software which is separable will be released under LGPL. Software linked to other third-party libraries may use other license if required; this will be limited to Free/Open source licences.

 

5.3 Sustainability

33This project is largely based around innovative, novel and leading use of existing software. As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models.

 

34The WebPrints archive will be run from the same server as knowledgeblog.org; this is being developed and maintained and will be for the foreseeable future, and the additional of the WebPrints archive will not be a substantial additional cost. However, should this cease to happen, the content of the WebPrints archive will be creative commons or an equivalent permissive license. This will make it possible for the JISC funded UK Web Archive to store the website for the future.

 

35Although, we will not be able to sustain publication by the WebPrints team past the lifetime of this proposal without further funding, author publication will be possible; our experience with existing tooling is that this is possible for many, although requires some level of technical skill, depending on the word-processor package, and level of complexity of the paper.

5.4 Staff Recruitment

36All staff are already in post. Recruitment during the project will therefore be unnecessary.

5.5 Key Beneficiairies

37Our immediate beneficiaries Newcastle University staff, who will have their work published using a new and novel publication technique. Critically, we will demonstrate the value of this form of publication technique to both researchers and librarians within the University who will in future be better placed to use or support this technology to publish their own or others work in future.

 

38Although presented here as a discrete project, the work fits within the background of the wider blogging community. So, our own knowledgeblog project and website will be able to take advantage of software improvements that will happen as a result of this work. Additionally, the general academic blogging community will gain a new resource. Increasingly, this community is a critical path for public engagement in the academic process.

5.6 Community Engagement

39Community engagement will take place initially by direct contact; we will email authors to ask for their engagement in the publishing process. This should have the secondary effect of advertising the presence of our project. We have active engagement from the library staff, who are well known within the University. In terms of engagement with the resource outside of Newcastle, we will make active use of various web and social networking facilities. Our experience has shown that this can generate significant amounts of engagement in a relatively short period of time. Finally, we will advertise the work through standard academic channels of conference and journal publication; although effective, this tends to be slow. This is problematic for a short project, hence we consider this to be a secondary means of communication.

 

6. Budget

 

Removed for privacy reasons.

7. Project Team

 

40Dr. Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He is a active participant in the Scientific Blogging community, developed the initial idea for knowledgeblogs. As well as managing the knowledgeblog project, he is the developer of tools such as “Latextowordpress”, as well as WordPress plugins such as “Mathjax-latex” and “Kcite” all of which improve the usefulness of wordpress for academic communication.

 

41Dr. Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001. Subsequent positions included working for Bart’s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities. Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data. His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio- Linux) as well as acting in a research capacity analysing high-throughput data. He is currently active within the knowledgeblog project, having been responsible for adding software support for a review process, gravatars, syntax highlighting, PDF and ePUB exports.

 

42Dr. Simon Cockell has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). He is currently active within the knowledgeblog project, having been responsible for metadata support (including Coins), navigational support (for both humans and robots) and is a co-author of kcite and mathjax-latex.

 

43Allyson Lister worked for 6 years at the EBI in Cambridge, developing and producing the UniProt/TrEMBL protein database. She is currently focusing on the use of ontologies for the semantic integration of systems biology data with her current job at CISBAN in Newcastle University. Both at the EBI and at Newcastle University, she developed structured data formats including UniProt/TrEMBL and SBML. She has also been an early adopter of blog technology as a mechanism for communication of both her own and others primary research. Since 2006, she has co-authored a number of posts with other bloggers in the community and has been invited to be a guest author at both the ISCB news and the BioSharing blog. She has published papers highlighting the importance of social networking and live blogging to bioinformatics.

Paola Marchionni of JISC has give her permission to reproduce the feedback from the peer-review of my last JISC grant which sadly failed. I want to publish it here, as part of my desire for open science rather that as an opportunity to reply which, perhaps unfortunately, the JISC process does not otherwise allow.

I am a little surprised by some of the comments, to be honest. The main criticism was more expected though, which essentially says “it’s not crowd-sourcing if you pay people to develop content”. You have to try these things, but I did think that actually paying for content might be considered to be a little revolutionary. Ah, well, better luck next time.

Markers felt the form of this proposal was “robust”, however there wasn’t enough clarity on the deliverables and especially on how the value of what was being produced would be assessed down stream. They felt there was also some lack of information on how the currently JISC funded K-Blog project, due for completion in July 2011, related to this project and what the impact on its team would be, which seems to be the same team as the one proposed for this project.

The main concerns, however, were around whether this could really qualify as a crowdsourcing or community project – it was felt it was more about disclosing data than community engagement – also considering that the authors of the articles would be paid. There were some doubts about the sustainability of the project beyond the 7 months duration of the funding, as lack of funding would prevent more articles being created and metadata added by the team. One marker also felt that a risk analysis should have taken into account the risk of disparate communities not being aware of the content and using and engaging with it. A more clear identification of the various communities the project aimed to reach and a more targeted strategy for engaging with such communities would have been useful.

Finally, another issue that was raised was that there wasn’t sufficient information on how the partnership with Manchester University would work, either formally or informally, and the dissemination plans could have been stronger, as they relied mainly on the role of K-Blog.

— Paola Marchionni

About

This is the full text of a grant called “Knowledge in Biology” that we submitted to JISC, as a follow-up to our knowledgeblog grant. Unfortunately, this grant was not accepted. This blog post is the direct output result of Word; apologies if the conversion is imperfect.

 

 

 

Outline Project Description:

Many disciplines within the sciences are knowledge-rich; of these, biology is an extreme example. In order to make advances, biologists need to be able to access knowledge from both their own and related communities in an easily digestible form. However, the publishing of this knowledge does not fit well with existing scientific communities, as it is often not regarded as “research based” – rather it is a stored body of grey literature, often not publically available. In the Knowledge in Biology project, we will engage with disparate communities in disciplines that engage with biologists as well as the community of biologists themselves. We will generate substantial content describing how “Knowledge in Biology” is both produced and consumed in the pursuit of new discoveries, by commissioning the authorship of this content directly from the funding for this project.

We will leverage the output of the JISC-funded Knowledge Blog platform, as a tool for coordination, publication and dissemination of this content. The result will be a publically accessible, high-impact resource of short, readable and accessible articles describing how to gather, manipulate and synthesise knowledge in biology. This will be of significant value in supporting the multidisciplinary research that is necessary for advance in modern biomedicine.

 

1. Introduction

1This document describes a proposal for a project within the JISC “e-Content” programme call.

2Modern biology is a rich, complex, multi-disciplinary field. In particular, practitioners need knowledge about how to access, organise and structure knowledge itself. As a result, members of the community often need to cross the boundaries of traditional societal structures within research. By definition, this is not well supported by the more formal structures that scientists use for the publication and dissemination of knowledge. So while the information exists, it is not accessible; hidden from the community on the desks and hard-drives of individuals.

3One of the difficulties with migrating this community-based knowledge away from grey literature to a more openly-accessible archived and referenceable form is the lack of a formal reward structure. Although scientists may engage in this form of activity from a sense of public duty, this form of documentation is not critical for their career advancement, or for gaining academic creditability, and so it is rarely made a priority. While technological advances have made publication of this material straightforward, the social structure of science has not supported it. As a result, there is a large body of knowledge about how biologists conduct their work that is simply lost to the community, meaning considerable lost time and effort recreating this knowledge, only for it to be lost again.

4We plan to circumvent this societal barrier using a novel approach – we will directly commission the authoring and reviewing of articles embodying this content. As the knowledge will often be readily available to individual members of the community, and we are aiming for articles which are neither of the size nor complexity of formal research publications, it will be possible to generate a substantial body of content, at relatively low-cost.

5An ideal mechanism for publication of this knowledge has already been funded by JISC as a part of the “Managing Research Data” programme. This is the Knowledge Blog project: a light-weight publication tool, that integrates well with scientists’ existing work-practices, based around a commodity blogging engine. This ‘Knowledge in Biology’ project (KiB) will utilize the work from Knowledge Blog, to the benefit of both: this project will gain a technological underpinning at little cost – Knowledge Blog already exists and will require a small increase in resources to manage the additional content and traffic; Knowledge Blog will gain substantial content and enormously increased visibility.

6The KiB project will provide a small amount of funding for the management and commissioning of articles, but the majority of the funds will be spent by using individually small amounts of money, crowd-sourcing the development of a novel digital content resource, engaging the community of biomedical researchers, both as authors and reviewers. The content will address key issues relating to knowledge in biology such as, data standards, linked data, knowledge in synthetic biology and statistical approaches to knowledge, as well as “softer” issues such as the use of Web 2.0, the social web, and the blogosphere as tools for the biomedical researcher.

2.1 WP1 – Knowledge Blog (k-blog) maintenance and support

 

7The primary purpose of this proposal is to generate significant quantities of digital, community-developed content. The k-blog platform already exists, supported by a previous JISC call. We are not, therefore, proposing to make significant enhancements to either the process or the software in the course of this project. However, the additional load placed on the platform will require a small amount of administrative work in terms of maintenance.

8In addition, we will need to provide support to the users of the platform; while k-blog is relatively easy-to-use, issues do arise with authoring, with formatting or with exceptional requests (for example, multi-media documents).

9For articles to be properly citable and maintainable, manual intervention is required to supplement the text with computationally accessible metadata, including DOI assignment. This enables improved archiving and discovery, which increases the value of the resource. As part of WP1, we will annotate documents with this metadata to ensure consistency and to avoid placing the burden on the main authors.

10We will install and refine a licensing plugin for the k-blog platform, which clearly displays license information for each article, based on the author’s selection.

 

2.2 WP2 – Management of publication process

 

11Articles in KiB will be produced by crowd-sourcing and by the in-house team (WP4). Our aim is to bootstrap the KiB k-blog so that it reaches a critical mass of articles that will attract both readers and more authors. We will commission articles from specified, expert authors with the attractor of a small payment. The payment will require the contributor to both submit an article and a review for another article.

12In preparation for this work, we have compiled a list of topics for KiB and put names against these topics. We have clustered the topics around themes in KiB: The role of semantics In biology; the representation of knowledge in ontologies, terminologies and vocabularies; data integration to create knowledge resources; data and knowledge standards; knowledge technologies such as RDF, Linked data, OWL, etc.; text mining; case studies and applications of knowledge in biology. These clusters, and more, will become the categories in the KiB k-blog. The letters of support indicate the significant number of authors that have promised to author an article on one of these topics. We will seek as wide a selection of authors as possible, guided by our advisory committee (see Section 2.8), to help give the KiB k-blog a balanced view on knowledge in biology. A significant part of this WP will be the commissioning of these articles and discussions with authors on this new digital content sourced from the community.

13This process will need managing: requests for particular articles (WP2.1); negotiation on topic and scope (WP2.2); managing of the author-guided review process (WP2.3); and, enabling payments to be made. This activity will help ensure that the core of the KiB k-blog will be of sufficient quality to attract readers to comment and contribute articles, as well as to simply read and learn.

2.3 WP3 – Outreach and Community Engagement

 

14Outreach and community engagement are intrinsic to this project. The presence of a high-quality, organised resource, freely available on the web will attract readers; likewise, a widely-read resource will be attractive as a publication centre for authors, particularly when supported by funding as part of WP2. The use of a rapid publication framework, available on the web, archived by the British Library and indexed for searching by Google, therefore, is our main form of outreach.

15However, this process can be augmented. All content will be available and reusable under a Creative Commons license, making it reusable with citation outside of the KiB environment. We will maintain active “Social Web” streams through Twitter. We will solicit articles relating to the use of Twitter and the blogosphere from members of the scientific blogging community; as well as generating content, this will leverage their existing readership, raising awareness of KiB, both as a resource for readers and authors. We will maintain a well-advertised mailing list allowing requests for, or offers of, new articles either commissioned or otherwise.

16Finally, we will advertise the resource through normal academic channels of paper and poster presentation. Where possible, we will also propose micro-workshops (aka Birds of Feather meetings) at suitable meetings/unconferences.

2.4 WP4 – ‘In house’ article authoring

 

17The staff on the project will contribute a significant number of articles to the KiB k-blog. Stevens will produce 20 articles; Lord 10 articles and Swan 10 articles (WP4.1). Both Lord and Stevens have already contributed articles to the Ontogenesis k-blog and will further extend on Ontogenesis in the wider KiB topics. These topics will include articles on tips for modeling in OWL; using ontologies with linked data; converting data to RDF and linked data; On-line knowledge resources; using ontologies in over-representation analysis of microarray data; integration strategies; and so on. Some of these in-house articles will act as glue that draw together many of the other articles. For example, an article on the role of knowledge in biology will draw together the need for the k-blog and act as a pathfinder. Where appropriate, we will use tools such as “Anthologize” and “Web Trails” to facilitate these aggregation activities. In house articles will be reviewed (WP4.2) by an external reviewer, potentially from the pool of contributors sourced in WP2.

2.5 WP5 – Project Management and JISC Requirements

 

18Management of the project will use regular weekly teleconferences, to ensure that all aspects are proceeding according to the project plan. In addition, we will fulfill the legal requirements for collaboration agreements and the formal reporting requirements from JISC as part of WP5.

19To ensure maximum community and public engagement in this proposal, all appropriate documents will be posted using the k-blog environment in addition to those locations specified by JISC, except where that information is withheld under normal FOI rules.

20Finally, we will gather and collate statistics on the use of these articles as measures of impact; directly in terms of page views from the underlying k-blog platform; indirectly from incoming links (both those using trackbacks, and those discovered using Web searching tools) and comments; and finally through secondary indicators such as Twitter and email communications. These statistics will also be made publicly available where appropriate.

2.6 Timetable

 

Name 

Start 

End 

Staff

Notes 

WP1 

1/3/2011 

30/9/2011 

DS, PL 

Maintenance of k-blog infrastructure 

WP2

1/3/2011

31/7/2011

   

-WP2.1

1/3/2011

1/4/2011

ALL

Crowdsourcing of articles

-WP2.2 

1/4/2011

31/7/2011

ALL

Content negotiation and creation

-WP2.3 

1/4/2011 

30/9/2011 

ALL 

Articles reviewed and published

WP3

1/3/2011

30/9/2011

ALL

Outreach and engagement

WP4

1/3/2011

30/9/2011

   

WP4.1 

1/3/2011 

31/7/2011 

ALL 

In-house content generation

WP4.2 

1/4/2011 

30/9/2011 

ALL 

In-house articles review and publication

WP5 

1/3/2011 

30/9/2011

PL 

Project management and JISC compliance 

 

2.7 Deliverables

 

21A high-quality body of content, consisting of a series of articles from multiple authors; describing different topics fitting within the theme of “Knowledge in Biology”. 40 of these articles will be authored in-house. A further 200 will be sourced with consultancy payment. We anticipate many others will come from crowd-sourced, enthusiastic authors, engaged with the process.

 

22A website, based on the k-blog platform, that delivers this content.

 

2.8 Project management arrangements

 

23The project will be managed from Newcastle University; the primary management will be from Dr Lord, who will be responsible for:

 

    – Developing Project Management Plans;

    – Ensuring that the Project work package objectives are met;

    – Prioritising and reconciling conflicting opportunities;

    – Reporting and collaborating with JISC programme Manager;

    – Dissemination of community content.

 

24Project progress will be evaluated through scheduled, short, “stand-up” meetings on a weekly basis, conducted face-to-face, via Skype or phone as appropriate. Primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. User consultation will be via public mailing list. Close tracking of requests for content and payment of authors is essential, and transparent procedures will be put in place for this. All staff are associated with several other projects and duties (research, research support, teaching and training), and are responsible for managing these independent workloads. All have experience with the k-blog platform and process.

 

25We have formed a small, unpaid advisory committee from recognised experts in the field. They will be invited to give feedback on the topics covered at 2, 4 and 6 months into the project; this will help to ensure an even and representative coverage of the area, that is not overly biased by the particular interests of the staff on the project.  Mark Musen (Stanford), Chris Rawlings (BBSRC Rothamsted) and David Shotton (Oxford) have all agreed to be our advisory board.

2.9 Risks

 

26Staff Risk – as with all projects, loss of staff could negatively impact on this project; however, all staff are on permanent contracts, have long histories in research, so this is less likely. Additionally, the nature of the workload means all staff would be able to cover duties relating to sourcing and generating community content, we limit the risk should a single person leave.

 

27Lack of community engagement – the strength of this proposal depends on contributions from many different authors, generating new, novel and, currently, unavailable content. However, there is also a risk that the community will not wish to contribute. We have limited this risk by offering to pay people consultancy rates – an unusual reward within academic research; however, we will only need to commit funds following the submission of the content, so should authors not deliver, we will reallocate these funds. Should we still find it hard to solicit contributions, we will increase the rates per article.

 

28Technology dependencies – Content will be disseminated in the form of k-blogs, and thus there is a dependency on the k-blog platform. It is already suitably developed and packaged. The k-blog platform is a publishing framework only; it is not essential for the authoring of articles. This limits the scope of the risk. Content could be published independently of the k-blog platform, with only a small loss in the feature set. Additionally, content could be relocated elsewhere at any time; it would retain its value outside of the k-blog platform. With the archival agreement under the Sustainability section, archives of the original KiB content will always be available.

 

2.10 IPR position

 

29It is essential that content is released with as few restrictions as possible on re-use and re-purposing, but authors must be allowed to maintain credit associated with the original work, or they are unlikely to contribute. Project members agree to release their work under a Creative Commons Attribution-NonCommercial ShareAlike 3.0 Unported License (CC BY-NC-SA), which allows re-use and modification for non-commercial purposes with attribution. This is in line with the JISC Model Licence. Authors invited to submit articles will be allowed to choose a Creative Commons licence of their own but will be strongly encouraged to use as permissive a licence as possible. Choice is offered to allow considerations of different institutional policies on published content. Public domain submissions will also be accepted to accommodate US government employees; these submissions will be uncommissioned.

 

2.11 Sustainability

 

30To maintain the persistence of the online resources beyond the end of the project, documents produced by project staff and KiB contributors will be publically available and clearly licensed. The k-blog site and sub-domains are already archived by the UK Web Archive, in which JISC is an active partner. The Digital Curation Centre will be asked to provide strategies for long-term database archiving.

 

2.12 Staff Recruitment

 

31All staff are already in post.

 

3 Impact

 

32Our key beneficiaries are the community of researchers working to develop knowledge in biology. Specifically this focuses on groups involved in data standards, linked data, knowledge in synthetic biology and statistical analysis of biological data. The needs to this community are clearly demonstrated from our Ontogenesis experiment, which is currently receiving 1000 page views per month for a small number of articles. Simple question and answer websites such as http://biostar.stackexchange.com/, receive over 2k page views per week; however, there is a gap between this and more formal knowledge.

 

33We will generate statistical information, using the k-blog platform as a clear metric of impact; for freely available, reusable and web-delivered content indicators such as page views are well recognised, and the main form of impact assessment. Both natively, and through tools such as Google analytics, the k-blog platform can provide comprehensive and detailed feedback on access of individual articles. We will also exploit secondary impact measures, including Twitter through appearance of suitable hashtags; comments and trackbacks to articles on KiB; and, finally, links to KiB as provided by web search.

 

34We will seek to increase impact through a number of activities in addition to normal academic channels. First, we will invite contributions from well-known members of the scientific blogging community that should result in secondary readership. Second, we will invite contributions on relevant topics that have become of recent public interest. Thirdly we will monitor article popularity; for areas that prove to be of interest or are controversial we will seek to commission additional content.

 

4 Partnership and dissemination

 

35Internal engagement of core project members, and the wider community of researchers crowd-sourced to supply content will be via the mailing list, after initial approaches are made. The plans for content generation are further outlined in WP3 and WP4. Content generation will allow further interaction with more disparate groups (content consumers), who will be encouraged to engage through the k-blog process and the project mailing lists. The advisory committee will be able to ensure that our engagement with the content-producing community is representative of the community. The nature of the k-blog process means dissemination is intrinsic to content generation.

36Project members are on the existing JISC funded Knowledge Blog grant in the “Managing Research Data” programme. We will approach individuals with funding from this and other programmes, requesting articles describing the value of these projects to biologists. We will, of course, also be pleased if JISC programme managers wish to contribute articles to this knowledge in biology resource.

6 Previous experience of the Project Team

 

37Dr Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He was an active participant in the Ontogenesis network, and is currently leading the JISC funded Knowledge Blog project. He is an active blogger and developer.

 

38Dr Robert Stevens is a reader in Bioinformatics in the Bio and Health Informatics group at the University of Manchester. His main areas of research are in the development and use of semantics within the life sciences. This is blended with the use of e-Science platforms to gather and manage the data and knowledge of the life sciences. He was PI on the Ontogenesis network that ran the meetings for the first Knowledge Blog. He is or has been a co-investigator on the myGrid and myExperiment grants that will provide both content and technical input to this project. As well as the JISC funded myExperiment project, Stevens was an investigator on the JISC funded CO-ODE project that developed Protégé 4. On the back of this, Stevens has led the OWL training activities at Manchester that has directly fed in to the Ontogenesis Knowledge Blog. Stevens currently leads content development for the JISC Knowledge Blog grant.

 

39Dr Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001. Subsequent positions included working for Bart’s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities. Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data. His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio-Linux). He has been most recently involved in the JISC Knowledge Blog grant, providing technical support and engagement with microarray community.

 

 

I’m very pleased that our grant for knowledgeblog has been accepted by JISC. I shall follow the tradition that I set with my last post, of publishing all my primary scientific output on this blog. In this case, I’m using Word, which like the latex that I used last time isn’t perfect. Still improving this process is part of the knowledgeblog proposal, so this post is also attacking a key deliverable for the grant!

The main content for this post is also available on the knowledgeblog events blog.

 

Outline Project Description

The project extends existing blogging tools for use as a lightweight, semantically linked publication environment. This enables researchers to create a hub in the linked-data environment, that we call knowledge or k-blogs. K-blogs are convenient and straight-forward for authors to use, integrating into researchers existing work practices and tools. The provide readers with distributed feedback and commenting mechanisms. We will support three communities (microarray, public health and workflow), providing immediate benefit, in addition to the long term benefit of the platform as a whole. Additionally, this will enable a user-centric development approach, while showcasing the platform as the basis for next generation research publishing. 1. Introduction

1This document describes a proposal for a project within the JISC “Managing research Data” call. Data comes in many forms, from raw statistics, to highly structured databases, through to textual reports; natural language, although hard to search and manage, is still the richest form of representation; data in the form of reports and publications are the central hub around which all other data sit. This project, therefore, will provide a lightweight, yet extensible, framework for scientific publishing, incorporating a software-supported peer-review process. Bi-directional links will be maintained both between publications and to other forms of data, using semantic markup to enhance the meaning of these links. We will also customize this framework for three communities which, as well as being directly useful, will provide real-world requirements. The project will largely develop “glue” between existing, widely-used, open-source software systems, ensuring its sustainability and usefulness past the end of the funding.

2. Fit to Programme Objectives and Project Outline


2The project call identifies the complexity and hybrid nature of the UK research data environment; despite this, one central focal point remains — most researchers spend considerable amounts of time discussing their data in the form of “paper” publications. For some, more theoretical disciplines, such as parts of computer science, the paper is the sole output; in others, such as biology, datasets are associated with papers and the barriers between “publication” and “data” are breaking down; most data sources in biology are rich in annotation; text that supports and explains the raw data. It is normally the annotation, not the raw data, which defines the quality of the resource. In these cases, text is an intrinsic part of the data.

3However, the conventional publication process has changed relatively little; the adoption of web technologies have largely been used as a distribution mechanism. Publications are still expensive — either at subscription or publication time, depending on the business model of the publisher, and involve considerable, time-consuming interactions between author and publisher, often relating to display and presentation issues. This is in stark contrast to, for example, the biological data centres where both raw and annotated data are often made available within hours of their generation.

4This situation is unfortunate because it limits the ability of researchers to customise their publication process for the requirements of their own discipline. As demonstrated by Shotton et al, and Rousay et al, it is possible to add considerable value, both enhancing the paper for the reader, as well as providing direct and semantically enhanced links to underlying data. The cost of the existing process, however, makes this form of publication unlikely for some data; for example, few scientists publish papers about negative results, resulting in an acknowledged publication bias,. As a result, it is hard for the semantically enhanced publication to take its place as the central hub for a linked data environment as envisioned by Coles and Frey, linking to and between research datasets, and the published knowledge about these datasets.

5In the last decade, the blog has become a common, web-based publication framework. There are now numerous off-the-shelf tools and platforms for managing blogs, providing a high-degree of functionality. Many scientists blog about their work, about other published work (research blogging) or “live blog” about conferences and talks as they happen. In this case, the researcher is in-charge of their own publication environment, can extend it to their requirements, and publication happens immediately. However, the blog has not yet become a standard means of publication for primary research output.

6Recently, as part of the EPSRC funded Ontogenesis network (ref), we trialled the Knowledge Blog process; in this case aimed at producing an educational resource describing many aspects of ontology development and usage, which might previously have been published in book form. We have shown that with this technology base, it is possible to replicate many of the features of the open peer-review, scientific book publication process; following two small meetings, we have written around 20 articles, and the website maintains around 1000 post reads per month (not simple hits!). To achieve this, we used only two features of the blog — trackbacks (bidirectional links) and categories (hierarchical keywords); although we used the WordPress blogging software, these features are supported by most other systems. We call these articles k-blogs.

7Currently, however, the k-blog process is not fully supported with blog software alone, nor does it fully support the referencing, advanced linking and provenance needed specifically for research publications. For this project, we propose to provide extensions to support data-rich publications, deeply and semantically linked to other k-blogs and to other forms of data repository. Therefore, the project addresses the objectives and aims of the call through four main workpackages.

1) A documented k-blog process (WP1.1) describing different levels of  peer-review suitable for different forms of research data. An implementation (WP1.2), the k-blog platform, of these process based around open-source, off-the-shelf software.

2) Extensions to the k-blog platform supporting linking. This includes full support for referencing including COINS metadata on posts (WP2.1), client-side and permanently linked versions (WP2.2) and bidirectional links (WP2.3) to other data sets. We will add semantics to these links using the Citation Ontology (CiTO) (WP2.4).

3) Support for three specialist environments—healthcare (WP3.1), microarray (WP3.2) and workflows (WP3.3). All useful in their own right and showcasing the extensibility of the framework.

4) Documentation and tooling to integrate the k-blog process into scientists existing working practice and tooling; scientists will be able to publish from Word, OpenOffice, Google Docs or LaTeX (WP4.1). We will add tooling and documentation, as WP4.2, to support the use of reference management tools such as Endnote, Mendeley or Zotero, making use of deliverables from WP2.

3. Quality of proposal and Robustness of Workplan

 

3.1 WP1: Knowledge Blog Process

8In this project, we aim to develop a light-weight publication framework, including the desirable aspects of the formal peer-review process. However, different forms of scientific publication require different levels of peer-review. For example, for http://ontogenesis.knowledgeblog.org, we require two reviews from an editorial board, assessing quality, appropriate for an educational resource. However, for http://process.knowledgeblog.org, which is intended to contain informal “how-to” and request for comment documents, a much lighter-weight, single editorial review assessing scope alone is more appropriate. Deliverable WP1.1 will consist of documentation describing both formally and informally, a number of levels for the knowledge blog process, and how these can be achieved using a blog. These documents will, themselves, be published on http://process.knowledgeblog.org.

9These processes will be implemented as Deliverable WP1.2, comprising freely available and widely used pieces of software, with additional “glue”. The basic publication framework will use WordPress 3 (WoP) — an open-source, multi-site, multi-author blogging system used to provide the hosted blog service at http://www.wordpress.com. While, we have found that WoP supports many aspects of this process, particularly from the readers perspective, a significant degree of “book-keeping” is required from authors, reviewers and editors. Readers know whether a paper has been reviewed or not, but authors have to remember for themselves who is reviewing the paper. Therefore, we will use a “ticket system”, specifically Request Tracker 3 (RT) (http://bestpractical.com/rt/). Both WoP and RT are extensible with plugins and will be extended and adapted to reflect the k-blog levels of WP1.1.

10We will use this extensibility to provide a light-weight integration. RT operates as an email response system; by extending WoP to send email on submission of new papers, this can provide both an integration point, as well as the main point of interaction for authors, reviewers and editors. To provide editorial and reviewer functionality tickets can be moved between queues; extensions to RT will use standard blogging XML-RPC calls to feedback to WoP by, for example, re-categorising papers once accepted. OpenID (http://openid.net) will be used to integrate the user accounts between the two systems. WoP already supports this fully, while RT supports it in skeleton form.

11Although we will provide an implementation of the k-blog process, it will be described sufficiently generically to support complete and independent implementation.

 

3.2 WP2: References and Metadata
12For k-blogs to become an integral part of the scientific record, they must fully support the semantic and linked data environment. Although WoP supports standard URI based linking to resources, and bidirectional “trackback” linking to other resources, it lacks complete functionality suitable for research communities. This is a rare example of functionality that is not already provided by WoP or an associated plugin. Deliverable WP2.1 will fulfil this need; we will support the insertion of at least DOIs and PubMed IDs (PMID), that will be resolved to full human-readable reference lists for display, using APIs provided by CrossRef and NCBI eUtils respectively. To fully support computational agents wishing to access the same information, references will also support COinS metadata, embedded into the display HTML.

K-blog posts will also require outward facing metadata, that describe the resources they provide in a standards-compliant manner. The Open Archives Initiative (OAI) provide standards that aim to facilitate the efficient dissemination of content. Specifically, the Object Reuse and Exchange specification (OAI-ORE) is a standard for the description and exchange of compound digital objects  (such as a WoP post or page). The WordPress OAI-ORE plugin provides link header elements that implement this specification.

13Our initial investigations into the k-blog process showed that WoP support for versioning and provenance are lacking; the k-blog process involves updating papers after submission but before final acceptance. While WoP stores all these versions, these are only currently visible by authors or editors through the administration interface. Whilst existing plugins for WoP already provide some of this functionality, Deliverable WP2.2 will uncover these to readers, along with a defined permalink scheme for access to all versions, providing full provenance.

14WoP supports bi-directional links in the form of trackbacks; this is mediated by XML-RPC calls between resources when a link is made. This will support linking to data where, for example, the data is another k-blog; however, general data resources may lack support for this process. Therefore, as Deliverable WP2.3, we will provide a trackback proxy, hosted on the http://knowledgeblog.org server, storing and presenting these links for resources that cannot directly process trackbacks.

15To complete this work package, we will add semantics to the links using CiTO, as Deliverable WP2.4. Therefore, as well as enabling easier data linking and provenance, we will also enable addition of meaning to these links.

 

3.3 WP3 – Specialist Environments

16The k-blog platform and process is designed to be flexible and adaptable to the needs of specialist environments. We will use three main use cases to ensure real world applicability of the software, as well as fulfilling the immediate needs of these communities.

17For Deliverable WP3.1, we will add additional features for supporting the microarray community. Currently, the microarray community is well serviced in terms of metadata capture (MIAME) and deposition in public repositories (ArrayExpress, GEO). As part of WP2, we will support linking to these datasets through stable URIs. However, these resources deal only with data generation. Post-processing and analysis is largely captured at the publication stage, often in supplementary material.

18A substantial amount of this analysis uses BioConductor: a widely used, open-source platform for statistical microarray analysis based on the R statistical programming language. We will extend k-blog with specific support for R and BioConductor. Authors will be able to directly embed code into k-blog papers, along with the figures that result; as a result reviewers and readers will be able to see a computationally precise description of methods and replicate the generation of figures should they choose.

19Finally, we will investigate the possibility of publication to a k-blog using only R code and references to public databases, in a process similar to Sweave — figures will be generated on the server, provide guarantees of correctness and precise provenance. The limited scope of this call means this part of WP3.1 will be proof-of-principle only.

20For WP3.2, we will focus on the public health community (PHC): a key workforce in delivering quality and effective healthcare by providing timely and accurate public health intelligence (PHI),. PHI is a varied environment performing statistical analyses: producing information figures, diagrams and reports to communicate results to the wider health community. However, the PHC operates in small groups with little knowledge networking. The main aim of the k-blog is to improve the availability of health information, data and knowledge, to inform decisions for health protection and care standards as supported by the Quality Improvement Productivity and Prevention initiative. The NWeHealth e-Lab project, hosted at The University of Manchester, provides an environment to bring together research objects into a single location. As elsewhere, textual data forms the key hub that links together all the other forms of knowledge. By linking to e-Lab research objects from a k-blog, this link will be made explicit, available, interpretable and directly valuable to the PHC; as a result WP3.2 is synergistic with the rest of the proposal. This community also bring a set of access control requirements. To support these we will use existing WoP facilities, providing a simple, easy-to-use three level access model.

 

20For WP3.3, we will generate k-blog content about Taverna workflows and methods for building them. Workflows have become a popular way of realizing computational analyses and have become an important form of data. The JISC funded myExperiment project is widely used to disseminate the workflows themselves. Knowledge about issues surrounding workflows is, however, more difficult to produce and disseminate. A k-blog, with its ability to produce short, targeted articles as the need arises and the resources become available for writing, suits the need for taverna workflow documentation. We will seek k-blogs on Taverna issues such as: the basics of workflow design; how to choose among a set of similar services in producing a workflow; and, the testing of workflows. We will implement a light-weight mechanism, using trackbacks, to link between the k-blog and myExperiment.

 

21As part of WP3, we will also hold four workshops, at 3-month intervals, each focusing on one particular k-blog and community. These workshops will be of the form previously trialled as part of the Ontogenesis network, and will serve several purposes; requirements gathering and feedback for us, education for the community and development of content, that demonstrates the process to the general readership.

 

3.4 WP4 – Integration with Existing Working Practices

22For the k-blog process to be acceptable to communities such as those described in WP3, it must fit with existing working practices. Researchers mostly write documents using a word-processor. Fortunately, as the k-blog platform is based on the widely-used WoP, which in turns offers a widely-supported API, this style of working can be readily integrated. It is already possible to author using Word (2007 onward), OpenOffice, Google Docs and LaTeX using integrated or existing technologies, as demonstrated by our previous work at http://ontogenesis.knowledgeblog.org. For Deliverable WP4.1, user oriented documentation, describing these tools will be developed. This documentation will also describe clearly how to present and organise papers in a way which is optimized for the k-blog process. While, we expect this documentation to take a significant time-span to produce, refining it as a result of user feedback, it is important to note that a k-blog is already useful and possible.

To take maximal advantage of linking technologies developed in WP2, we will need to integrate with existing technologies for referencing. As deliverable WP4.2, we will add tooling to enable the use of bibliographic tools such as Endnote, Mendeley, Zotero or BiBTeX to insert references that k-blog can directly translate. Largely, this should consist of “styles”, modifying the in-text citation, as the reference plugin of WP2.1 will generate reference lists. As with other deliverables, this tooling will include substantial documentation, developed using the k-blog process.

4. Project Timeline

 

Name

Start

End

Staff

Notes

WP 1

02/08/2010

30/10/2010

   

WP 1.1

02/08/2010

31/08/2010

All

A documented k-blog process

WP 1.2

01/09/2010

30/10/2010

DS,SC

Implementation with off-the-shelf software

WP 2

01/11/2010

30/04/2011

   

WP 2.1

01/11/2010

26/02/2011

SC

COinS metadata on posts

WP 2.2

01/11/2010

29/01/2011

SC

Client-side, permanently linked versions

WP 2.3

03/01/2011

26/02/2011

DS

Bi-directional links to other datasets

WP 2.4

01/03/2011

30/04/2011

PL

Semantic linking with CITO

WP 3

01/11/2010

30/07/2011

   

WP 3.1

01/11/2010

30/07/2011

GM

Specialist environment – Healthcare

WP 3.2

01/11/2010

30/07/2011

DS

Specialist environment – Microarrays

WP 3.3

01/11/2010

30/07/2011

RS

Specialist environment – Workflows

WP 4

02/08/2010

30/06/2011

   

WP 4.1

02/08/2010

30/04/2011

GM,DS

Authoring documentation and tools

WP 4.2

02/05/2011

30/06/2011

GM,SC

Referencing documentation and tools

 

5. Project Management Arrangements

23The project will be managed from Newcastle University; the primary management will be from Dr Lord who will be responsible for:

  • Developing Project Management Plans;
  • Ensuring that the Project technical objectives are met;
  • Prioritising and reconciling conflicting opportunities;
  • Reporting and collaborating with JISC programme Manager;
  • Dissemination of the k-blog platform.

Project progress will be evaluated through scheduled, short, “stand-up” meetings on a weekly basis, conducted face-to-face, via skype or phone as appropriate. Although most project staff are co-located, primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. User consultation will be via public mailing list, as well as through a “dogfooding” k-blog. All project staff have been handpicked; they are highly experienced and self-directed, as outlined elsewhere. All are associated with several other projects and duties (research, research support, teaching and training), and are responsible for managing these independent workloads.

 

5.1 Risks

24Staff Risk – as with all projects, loss of staff could negatively impact on this project; however, all staff are on permanent contracts, have long histories in research, so this is less likely. Additionally, by dividing the work between five individuals, we limit the risk should a single person leave.

WoP3 and other dependencies – the project depends on other software, most notably WoP for which a new version (3.0) is now in beta; however the software is widely supported. Other software is replaceable.

Standards Shifting – the project depends on a number of standards and these may change. In this project, we will NOT support standards, but rather use those that support us. Where standard change rapidly, their implementation will be delayed (till they stabilize) or dropped. None of the standards described here is critical to the success of the project.

 

5.2 IPR Position

25All code will be developed under open source licences. WoP and RT are licensed under GPL, so code linking to these will be likewise licensed. Code that is separable will be released under LGPL. Code will remain copyright of respective institutions or authors. Any documentation produced by project staff relating to the project will be licensed under Creative Commons Attribution license. Licensing of individual k-blogs will be delegated, but permissive licenses will be encouraged.

 

5.3 Sustainability

26This project is largely based around innovative, novel and leading use of existing software. As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models. The k-blog process will be cleanly separated from its implementation, ensuring only weak dependencies to underlying software. Where, we produce software “glue”, public and widely supported APIs will be used where possible. This will ensure that components are replaceable. All code, including historical versions will be publicly available. Documents produced by project staff will be publically available and clearly licensed so will be archived through the internet “cloud” resources; we are also seeking explicit support for archiving from the British Library.

 

5.4 Staff Recruitment

27All staff are already in post.

 

5.5 Key Beneficiaries

28Our key beneficiaries are the public health, microarray and workflow communities; as the k-blog process is based around commodity software, these groups can use the basic environment from the first day of the project to generate and share content. As the project progresses, so will the process, the software to support it and the documentation to explain it; at all stages, the k-blog process fulfils a clear and immediate need. While we are specifically targeting these communities, the k-blog process and platform is sufficiently generic that it can support a wide range of research activities.

Although presented here as a single platform, the process and components are separable and can benefit communities independently. In particular, the tools and documentation from WP2 and WP4 will find use within the research blogging community, who find, in particular, the lack of tooling for referencing difficult. Finally, the statement of a peer-review process, and its implementation within RT will be applicable to any peer-review environment regardless of the form of publication. This includes publications published using wiki or other Content Management Systems.

 

5.6 Engagement with Community

29We consider the mechanism for engagement with four kinds of community: engagement with our core content generating community is an intrinsic part of this proposal, as described in WP3. Further interaction with more disparate groups will be maintained through personal contacts; each of the five individuals named in this proposal are experienced and embedded in different communities (health care, microarray, ontology, proteomics). Engagement with our core content consuming community is, again, an intrinsic part of the proposal; all project communications will be via open mailing list or k-blog. Project members are active users of Web 2.0 social technologies; our initial trials as part of Ontogenesis showing this approach to be highly effective form of dissemination, with minimal effort. Engagement with software users will be via website and direct interaction. All software will be released or advertised via normal channels (website, versioning, and mailing list), including a (debian) package repository for those wishing to set up their own server. Finally, developer communities will not be specifically targeted, but our open source, continually integrated development plan will be attractive, and we will accept suitably licensed contributions.

30All communities will benefit from the open and agile development methodology we will adopt; changes to the environment will be integrated and released rapidly, ensuring continual improvement and facilitating rapid feedback cycles.

 

6. Previous Experience and Project Team

 

31Dr. Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He was an active participant in the Ontogenesis network, and developed the initial idea for knowledge blogs as part of this. He is an active blogger and developer.

 

32Dr. Georgina Moulton is an Education and Development Fellow at The University of Manchester. Since 2005 her main roles have been to co-ordinate the development, and delivery of multi-disciplinary bio/health informatics education programmes; and to facilitate the engagement of biological and health communities in a variety of bio and health informatics research projects (e.g., ONDEX, Obesity e-Lab). For 3 years, Georgina was the EPSRC funded Ontogenesis Network Manager, in which she co-ordinated the activities of the network and expanded the network through the facilitation of the development of new activities and was involved in the trial k-blog process. More recently her work includes the development and delivery in conjunction with NHS partners of an education and development programme tailored to match the needs of North West public health analysts and the wider healthcare workforce.

 

33Dr. Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001.  Subsequent positions included working for Bart’s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities.  Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data.  His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio-Linux) as well as acting in a research capacity analysing high-throughput data.

 

34Dr. Simon Cockell has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). 

 

35Dr Robert Stevens is a senior lecturer in Bioinformatics in the Bio and Health Informatics group at the University of Manchester. His main areas of research are in the development and use of semantics within the life sciences. This is blended with the use of e-Science platforms to gather and manage the data and knowledge of the life sciences. He was PI on the Ontogenesis network that ran the meetings for the first k-blog. He is or has been a co-investigator on the myGrid and myExperiment grants that will provide both content and technical input to this project. As well as the JISC funded myExperiment project, Stevens was an investigator on the JISC funded CO-ODE project that developed Protégé 4. On the back of this, Stevens has led the OWL training activities at Manchester that has directly fed in to the Ontogenesis k-blog. This range of experience makes Stevens an ideal partner to lead the development of content within this project.