Archive for the ‘LiveConference’ Category

This is a live blog from Neuroinformatics 2009.

Data management: View from 50,000 feet — dimensions are amount of structure and the number of data sources. More structure, less data sources.

Distinguishes between parallelisation and heterogeneity. Can distribute data across tables in an organised way — this is parallelisation; or, you can have lots of data, spread across resources, with multiple entities and with no common plan.

Outline — data integration and suggest data spaces as a solution.

Databases are so successful because it provides a level of abstraction over the data. Data integration is a higher level of abstraction still because you don’t have to worry how the data is stored or structured.

Mediated schema, uses a mediation language, a mapping tool, and then a set of wrappers over the datasources, which map them to a common syntax (relational database for example).

So, we know how to do it, but the cost of building data integration systems are really high. Creating the mediated schema or ontology is hard; sometimes it’s impossible. Mapping source to mediated schema can be a nightmare, because you need many people from both sides of the mediation. Are some automated systems, but human is always needed. Data level mappings (changing IDs, synonyms and so on). Social costs.

One of the problems with data integration is that it costs a lot early, but yields very little till quite a long time on, and it’s all done. What we really want is pay-as-you-go data management; want useful data out early and constantly.

Everytime human does something with data, they are telling you some information about the data. If you can capture this information then you can useful stuff with this.

Structured data on the web: the deep web, which is data behind forms; and two others. So, deep web. Knowledge which is not accessible through general purpose search engines — cars, houses and so on are examples of this. Uses data spaces as a way of doing this; learned different 5000 data sources in two months.

One possible way to access the deep web is to put queries against web forms. Have to guess what to put in; one way is to just use words on the form page in the first place. Currently, google gives much knowledge from this deep web; has the biggest impact on the deep web.

Web tables; can we exploit the knowledge from the tables better. There are 14billion tables on the web, of which about 154million are interesting — rest formatting or whatever. First problem is to identify schema elements; these are expressible in HTML but actually no one uses it. So have to guess. They got 2.6 million schemas. Would be good to put these into automcomplete (although not sure where).

Fusion tables lets you upload data and collaborate on the visualisation of it. Changes the visualisation options depending on the data types.

Conclusions — bottom up data-integration, which is more realistic than top-down. Dataspaces are an approach. Fusion tables is good.

Guiding principles of NIF. Builds heavily on existing technologies. Information resources come in all sorts of size and shape.

Highest level NIF registry. Web index of resources which are relevant to neurosciences.

NIF resource diversity — three different levels of data, with increasing amount of structure.

Is GRM1 in cerebral cortex? NIF system allows searching over multiple different resources. But problems; inconsistent and sparse annotation of scientific data. Many different names of the same thing and so on. Added to this there are over 2000 databases in the registry.

Uses mixed searching so that both ontological information and string based systems important for where there is no annotation. Can also do query expansion with ontology to get better querying.

Building ontologies is difficult even for limited domains, never mind all of neurosciences. Trying to do this with multiple levels. NeuroLex — single inheritance, lexicon. NIFSTD, standardize modules under same upper ontology. NIFPlus — create intra-domain and more useful hierarchies using properties and restrictions. .

Using logical classification as a result of properties of the entities.

Question — how to get the community involved. Need to provide an easy to use platform for community collaboration. They have a semantic wiki for contributing to neurolex. Really lowers the barries for entry for domain experts who wish to use (and extend!) these terms.

Lots of people are starting to use the resources (they find this out because people complain when the systems are broken!).

Contributing to Neurolex. Don’t need an account, but better if you have one, everything online. Many thing that they are looking at is content, content, content. More stuff the better. Finally, getting people to value ontologists is really important.

This is a live blog from Neuroinformatics 2009.

Motivation, what is the common feature of a set of disorders. They are all complex disorders, which we don’t really understand.

Alzforum is a nice example of an early web community. Alzheimers forum. Works as an ongoing journal club, with curated discussions. Started off during the early days of the web.

Developed StemBook which is an online book, launched about a year ago. Discussion of stuff that is happening. pd online research, is another alzheimers website, using a toolkit that they have developed. Linking across these forums can be a problem; need some forms of shared terminology server. Science Collaboration Framework. Based around drupal, allows common collaborative tools for biomedicine, shared ontologies/vocabulary and so on.

How do you link between these communities? Issues of semantic annotation; how does this happen? Are systems which allow you to guess what an ontology is; building system which should work across lots of different content management faciltiies. This can bring lots of benefits, as the additional semantics allows you to work around synonyms etc.

Discourse ontologies. SALT — semantic annotation of latex.

Need to support a spectrum of different knowledge structures, theasurai and so on. Less complex == more tractable to biologists. Complex and formalised, tractable to computers.

Are now integration discourse ontology into myexperiment and others.

Using existing work on entity recognition and try and produce a provenance aware representation of these results.

This is a live blog from Neuroinformatics 2009.

Creative Commons is based around issues with data and copyright, trying to change the idea that not sharing is the default. Science Commons looks at the issues specific to science.

Semantic web in a nutshell; adds to web standards and practices encouraging, common naming, ontology development, expression in knowledge representation language, easy integration over multiple sources, works both inside and outside the organisational boundaries.

Why should you want this? Network effects, people can use their own skills, and combine knowledge from many different sources. Provides efficiencies at the global scale.

Copy and paste for the semantic web; a mashup with knowledge from Allen brain institute, and google API. Had to screenscrape Allen brain for this.

Trying to look for druggable targets in pyramidal neurons. Google provides too many results, so does pubmed. Shows complex SPARQL query over the knowledge from the web; crossing from MESH to gene to GO. This may not be the best query, but it’s none the less useful and will make biologists happy.

A brief jump into ontology making. Terms that mix up material and neurotransmitter. Uses example, peptide, neurotransmitter, hormone and ligand; all of these could be peptides, although not necessarily. Need to untangle these. In many cases, these have already been done (ChEBI). Move from English to OWL.

How to build consensus in ontology building — somewhat related to OBOFoundary rules. Another program is INCF program for ontology of neural structures.

Challenges — building bigger ontologies is hard. Barrier to sharing are a major difficulty.

This is a live blog from Neuroinformatics 2009.

All of our observations about the brain are in some sense reductionist. We are looking at only thing at a time, and hope to infer knowledge from this. The knowledge is multi-technique — no single experiment is going to give the entire answer. Need to combine and integrate. Most of our data is descriptive — MRI is not that different from phrenology in one sense.

Process of dissemination — the web and equivalent — has been transformative of neurosciences. Large scale consortia are also important; has been involved in lots of these — sometimes painful — but useful. Good to learn the lessons from these.

The biggest lession from multisite brain mapping projects — the data needs to be open. If that data is open people will come, so long as it’s described.

Are new techniques coming along all the time; every near there is a new way of looking at stuff. Need to combine these forms of the data with knowledge from the past. There is a cost to this — digitizing and representing histology for instance, creates a lot of data. Currently can at 10 micrometre resolution on whole brain in terabytes of data.

One of the big issues is that, lots of the data is under patient confidentiality. Often can only store and check deidentified data. Are problems with metadata — some places have sent “phantom” images — which are used to callibrate the equipment, with a patient name on it. This sort of thing reduces the value. Need to check the data constantly.

Data Sharing and access control. Is a spectrum. Can release the data instantly it’s produced, six month after deposition, after publication, or never. Have a system to support this, with the acquirer having control over this.

Hardware — spend lots of money and eventually it will work. Have a 4PB system now, Uses a robotized tape system because spinning disks are too expensive.

Computer crashed out at this point, and I had to reboot, but he talked about Alzheimers. Gives a nice hypothesis that multi image databases could potentially answer.

With BIRN, data does not necessarily need to be centralised — it is possible to support distributed, but federated, databases. Have managed to aggregate and bring together information from many different resources. Databases need to have a suite of ancillary tools which we can use to look at the data.

Last example, ADNI — Alzheimers Diseaes, naturalistic study of AD progression. About 800 individuals with a variety of different techniques. Data is immediately released (same day often). Are about 90,000 images in the database; Downloads are highly periodic (not sure why!).

Data needs to be sufficiently well described, with integration across different datasets.

What works and what doesn’t. First, data — the data must be describable enough so that they can be understood. Second, the experiments need to be coordinated or they hard to integrate. Tools must be good. Needs to be a good focus. Size: the data needs to be big enough to have statistical power. Duration: databases must last, so must have enough funding. Mission: is it well enough defined. People: common purpose and leadership to carry forward. Sociology: do people agree what should be shared and when. Expertise: need this. Funding: need sustainability.

This is a live blog from Neuroinformatics 2009.

Neuron systems are incredible stable over time. Looking at a number of systems, including pyloric pattern generator — stomachs in crabs (?). Is a pacemaker system; it’s very stable between individuals and over time. Despite this, for example, the maximal conductance in the neurons varies pretty widely. How come this variability doesn’t affect stability?

Have generator a single compartment model, looking at 8 dimensional parameter space and making a big database. Trying to replicate the variance that they see within the biological systems. Tend, with their model, to get similar output for these different conditions. Conclusion of this is that neuronal system have a large solution space within which they maintain their functioning.

Question, how are these solution spaces distributed within the total parameter space? Could be a single unique solution, could be islands, etc. Been collaborating in high-dimensional space visualisation using dimensional stacking. Think it’s a slices through the dimensional space, stacked out side-by-side.

Talks about cell-type specific co-regulation of ion channels; there are lots of correlations between different forms of current. Interestingly, most of the relationships are linear and, so far, all of them are positive; it’s not clear that there are any negative correlations.

Have classified their model space. Found that correlation between channels is always in the same direction — a correlation which is positive in one cell type will never be negative in another. However, they are showing a negative relationship in some circumstances, when the experimental processes show a positive relationship which is an open question.