Category Archives: How to use Research Analysis & FAQ

Nano-publications – Review and Comparison to the Research Analysis model

Apart from the scientific knowledge management models of micropublications (Clark 2014) and the Biological Expression Language (http://www.openbel.org), the concept of nano-publications (Mons 2009, Growth 2010) is most aligned with the goals of Research Analysis and has provided valuable insights. In this article we provide an overview of the nano-publication model and discuss how it relates to the Research Analysis (RA) model.

The authors propose 5 steps required to create and adoption nano-publications (Mons 2009, Growth 2010):

  1. Terms to Concepts: This step requires that all terms in a research article are mapped to non-ambiguous identifiers. In nano-publications this is referred to as a Concept, where a Concept is the smallest, unambiguous unit of thought. A concept is uniquely identifiable (Groth 2010). This is similar to, but more ambitious than, the Medical Subject Headings (MeSH) database. Using MeSH as an example, MeSH Headings are the equivalent of Concepts and the Entry Terms for each MeSH Heading are equivalent to the Terms or synonyms for each Concept. We agree with this general goal and strongly promote the use of standard language. We promote the use of MeSH terms in Research Analysis. In Research Analysis users can use the MeSH Entry Term (synonym) they are most comfortable with and the system then ensures that this is mapped to the main concept or MeSH Heading. This allows the user to work with the terminology that is most comfortable for them and their peers, rather than being forced to use an ideal concept. In a separate article, we discuss the challenges associated with defining an unambiguous unit of thought.
  2. Concepts to Statements: Here they propose that each smallest insight in exact sciences is a ‘triple’ of three concepts, though conditions are required to put the insight in context. The triple is in the form subject > predicate > object. For example, cholesterol > increases > atherosclerosis. A Statement is a uniquely identifiable triple, which can be achieved through the assignment of a unique identifier to the triple by annotation. RA initially implemented a cause-effect model for statements, which is a special case of the triple where the predicate must be a cause-effect predicate. We currently only offer the predicates increases, decreases and not significant. We chose to initially restrict the options for triples to allow for the collection of a consistent database that would allow for analysis.
  3. Annotation of Statements with Context and Provenance: It is not enough to store statements just in the form of their basic components, three concepts in a specific sequence. A statement only ‘makes sense’ in a given context and taking a statement out of a research publication strips it of this context. The context in a nano-publication is defined by another set of concepts. The annotation is achieved technically through a triple such that the subject of the triple is a statement. For the example above, the species should be specified. Mice do not get atherosclerosis, even on a high cholesterol diet, but humans do. Also, provenance is associated to Statements by annotation eg. author, source. Claim’s in RA by default require that the user provide organ/cell model, genetic model and species annotations that a relevant for each specific claim. RA automatically assigns unique identifiers to claims and requires that at least one supporting quotation is provided from a publication. The supporting quotations require that the PubMed ID (PMID) is provided. Additional conditions and context can be provided by the users appending tags to the claim.
  4. Treating Richly Annotated Statements as Nano- Publications: treat these statements with conditional annotation as nano-publications via proper attribution so they can be cited and the authors can be credited. A nano-publication is a set of annotations that refer to the same statement and contains a minimum set of (community) agreed upon annotations (Groth 2010). This concept is similar to the claim model in RA, claims can be cited using their unique identifier and viewing a claim provides details of all of the quoted statements and associated PMIDs that support the claim, along with any other context provided via tags.
  5. Removing Redundancy, Meta-analyzing Web-Statements: where statements are identical they would be removed to simplify the database. The goal of this being to reduce “undue repetition” and to help improve the identification of new statements. Groth et al. define S-Evidence: all the nano-publications that refer to the same statement (Groth 2010) and, as implied by the name, provide evidence for the statement. The original model for nano-publications focused more on the removal of redundancy, but the concept of S-Evidence provides more respect for the importance of replication and the potential for meta-analysis. In complex sciences like biology, the likelihood of a statement being true based on the evidence of one publication is surprisingly low. For example, the uncertain reproducibility and re-usability of results investigated in the therapeutic development in the cancer field (Begley 2012). No single experiment, or for that matter any number of experiments, can fully demonstrate the truth of a statement. However, the collection of results that support a statement can, in a Poperian sense, provide some guidance to the level to which a scientific statement has had its metal tested. It can also allow for the bridging of knowledge between subfields where different terms for the same concepts are regularly used.

Nano-publication Model

Figure 0: The Nano-publication Model taken from Groth 2010.

The goal of nano-publications

The nano-publications authors propose the goal of having scientific authors structure their data in such a way that computers understand them and we support this goal. However, we feel that it is likely that the formalisation of scientific knowledge may become a specialist task, like the coding of software design specifications into software code. It is not clear how the nano-publications authors see the knowledgebase of nano-publications being used. We strongly believe that while knowledge coding may be a specialist activity, that most researchers in the biological fields will used tools based on such knowledgebases to help direct their research by identifying gaps, conflicts and opportunities in the current research.

Main differences between the nano-publications model and the Research Analysis models

We have discussed some similarities and differences between the nano-publication and RA models above, but here we go into a little more detail.

At the Concept level:

  • The nano-publication model requests that authors use their Concept Wiki unique identifiers, which aggregates concept names from many databases.
  • RA feels that the task of aggregating names is too much and not necessary given the already available databases. Also, we prefer that scientist use the names rather than IDs for concepts as this is more intuitive and in line with our mission.
  • RA currently requests that all claims use NLM’s MeSH Terms or IDs for the names of the elements or concepts in a claim. Where terms are not in MeSH then the NCBI Protein and PubChem databases can be utilised. In rare cases new concept names can be added into RA.
  • RA uses these external name databases to identify synonyms and ensure that if scientists search for a particular MeSH Term that they will receive results for all of the MeSH terms under the MeSH Heading.

At the Statement level (Claims in RA):

  • RA uses the concept of claims in place of the concept of statements in nano-publications. Claims are a simply a type of statement that we feel is more intuitive.
  • With regard to our natural language claims, we ask that scientists use the form of a declarative sentence that is as logically clear as possible. These statements will not be as logically clear as the nano-publication model and are likely to be more complex.
  • Our standard cause-effect claim model is quite similar to the nano-publication model. The primary cause-effect part of the claim is simply a special case version of the nano-publication triple eg. Treatment/Cause [subject] > Effect [predicate] > Disease/Molecule [object].
  • The other components of our cause-effect claim model are equivalent to attribution in the nano-publication model. We have 3 as standard:
    • (cause-effect claim, where it occurs in, Organ/Cell Model) AND
    • (cause-effect claim, where it occurs in, Genetic Model) AND
    • (cause-effect claim, where it occurs in, Species);
  • Note that the three conditions should be considered as conjunctions with the cause-effect claim. Conjunction is required as the conditions are related. The Genetic Model effects the Organ/Cell Model and both are specific to the Species. A simple annotation of each to the cause-effect claim would be ambiguous. It is only when all of the conditions are true that the cause-effect claim is valid.
  • In RA, statements must be supported by at least one quoted statement from a publication. These supporting quotes could be treated as annotations in the nano-publications model, but a difference between the two models is that a specific quote could be used as support for several claims. Further, for each PMID there are often several quoted statements. Technically these components could just be repeated as annotations for each statement, but in the RA model a map or graph concept is used where the statements are supported by quotes and quotes are supported by PMIDs. Claims, quotes and PMIDs are all considered to be elements in the direct acyclic graph. The S-Evidence concept in the nano-publication model goes some way to bridging this difference by grouping together nano-publications with the same statement, however the S-Evidence creates trees rather than a graph. This level of the RA model is much more like the micropublication model than the simpler nano-publication model.

Research Analysis Knowledge Management Model Diagram

Figure 1. Research Analysis Knowledge Management Model

  • The RA model does have some simple annotation like elements that associate information to the statement/claim such as a unique statement identifier, the author of the statement/claim (who may or may not be the author of the publication that the quoted sentence came from), time and date of creation, and tags that in effect provide flexible and unlimited simple annotations.

Both the nano-publication and micro-publication models have a substantial focus on the technical aspects of encoding the data in semantic web schemas. The reason for this focus is the importance of making the data available in an open and semantically rich format. We respect this goal, but have chosen to hide as much of this detail from our users as possible. While the biological research community has become computer savvy over the past decades, the majority of the community are not trained in any computer programming languages and would find these schemas very unpleasant. I have a computer science degree from before web technology was popular and I still find it very unpleasant. This is a little unfair, as these are articles in informatics journals and certainly we acknowledge that the micropublication authors have built user oriented applications.

One of the key goals discussed in the article on the Research Analysis Mission is that our focus is on making knowledge management tools available to normal biological and medical scientists in an easy to use and powerful way – not making the knowledge available to a few hard core geeks and their supercomputers. For this reason, none of the bare bones schemas are visible to the users and the frontend terminology is focused on usability rather than theoretical correctness. This is also the reason why we provide a number of standard models for capturing scientific claims in RA. These models will not be flexible enough for some, but for the rest they will be much easier and straightforward to use. Adoption and the actual acceleration of discovery by normal scientists is our top priority.

References

Mons, B., & Velterop, J. (2009, October). Nano-Publication in the e-science era. In Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009).

Groth, P., Gibson, A., & Velterop, J. (2010). The anatomy of a nano-publicationlication. Inf. Services and Use, 30(1-2), 51-56.

Clark, T., Ciccarese, P., & Goble, C. (2014). Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics, 5(1), 28.

Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature483(7391), 531-533.

 

Mission of Research Analysis

The overarching mission of Research Analysis is to accelerate the rate of scientific discovery through the provision of knowledge management and discovery tools to scientists.

Our knowledge management platform:

  • captures scientific claims from the literature in a formalised structure.
  • rapid and powerful search tools to find existing scientific claims
  • analysis tools that assess:
    • the level of support for a claim in the literature
    • conflicting claims
    • knowledge gaps in the literature
  • help scientists identify unique and promising hypotheses to test experimentally
  • is intuitive and easy to use

We have found that past efforts to build similar tools tend to focus on the computational aspects of knowledge management: How can we get all of the knowledge out of the literature and scientists heads and into a database that can then be mined by the geeks and AIs? This tends to imply that if only the computers could get all of the information, then they would do a better job than the humans. We disagree.

We don’t believe in a Robot Scientist future! We believe that humans continue to have a far superior ability to make the intuitive leaps that lead to scientific discovery. But on the other hand we believe that the computers beat us hands down on being able to manage vast stores of information and to be able to process this information rapidly using logical analysis. We believe that by providing powerful and intuitive knowledge management tools to leading scientists that they will be able to make higher quality intuitive leaps and at a faster rate. The goal of RA is not to extra the knowledge and feed it to the robots, but instead to feed it back to the scientists using powerful tools that make it easy and fast to interrogate.

Some examples of benefits that can come from using Research Analysis include:

  • We have all had the experience of remembering a finding from a paper, but not the specific details. Even once you find the paper, were is the specific support in the paper. RA puts these key findings at your fingertips.
  • Even the greatest scientists of all time regularly thought they had a new discovery, but later (sometimes much later) found that some obscure scientist had already come up with it 10 years prior. RA allows you to rapidly identify if there is existing support for a scientific claim.
  • On the other hand, you want to know whether a scientific claim is valid. Use RA to find all of the papers that support and challenge the claim to assess its merit.
  • Use the analysis tools to visualise matrices of scientific claims to identify conflicts, trends or gaps in the literature. A powerful source of new research topics.

Research Analysis is constantly looking for opportunities to improve and extend our platform to support scientists in their work. The platform was originally designed to meet the challenges of our collaborators in their research and we feel that the best way to improve the platform is through helping scientists solve hard problems. We would appreciate feedback, suggestions and the opportunity to work with scientists to help solve their problems.

Getting Started with RA: What are Claims? How do you work with them?

The Research Analysis platform is centred around the concept of the scientific claim. The goal of a claim is to take a hypothesis that has been argued in the literature and state it in a formalised language that allows for easier comparison, searching and analysis.

While there are several types of scientific claim, Research Analysis is currently focused on Cause-Effect type scientific claims eg. Statins Decrease Coronary Artery Disease in Humans. We have broken up the components of Cause-Effect claims and formalised their language and structure to make them easier to compare, search and analyse. The easiest way to see how they work is to look through some examples in the View Claims table, but here we provide an overview of the model.

Research Analysis Knowledge Management Model Diagram

Figure 1. Research Analysis Knowledge Management Model

  • Claim Elements:
    1. Treatment/Cause: This is the drug, environmental, genetic, etc cause that is made in the model system. eg. Statin treatment
    2. Effect: This is the type of effect that results from the Treatment/Cause. Currently we only offer three options: increases, decreases or not significant.
    3. Molecule/Disease: This describes the molecule or disease that is effected by the cause. eg. Cholesterol
    4. Organ Model: This describes the organ of the animal or the in vitro system where the cause-effect relationship was observed. eg. Blood
    5. Genetic Model: This describes the genetic model in which the cause-effect relationship was observed. eg. ApoE -/-, Wild Type.
    6. Species: This describes the species in which the cause-effect relationship was observed. eg. Human.
  • Standard Terms: Wherever possible, claims should use standardised language for the elements of the claims. Research Analysis currently requires that all terms should be sourced from one of the following, in order:
    • Medical Subject Headings (MeSH) standard – www.nlm.nih.gov/mesh. Elements of claims should either exist in MeSH as Subject Headings or any of the Entry Terms for a heading or the Tree Number/Unique ID. For example, Humans is a Subject Heading in MeSH, but you can also use the Entry Terms which include Human, Homo sapiens, Man.
    • NCBI Protein Database – www.ncbi.nlm.nih.gov/protein.The name including synonyms or the unique code eg. Accession for the relevant database can be used in Research Analysis.
    • NCBI PubChem Database – www.ncbi.nlm.nih.gov/protein. The chemical name including synonyms or the PubChem CID code can be used in Research Analysis.

We do allow for new terms to be added where there is no satisfactory term in the above databases, however the use of standardised terms allows for much more powerful search and analysis. For example, a scientist in one field may use a different term to those in another field, but using standard terms we can link these two claims together and provide cross-field visibility. In future we also hope to provide searching and analysis across different levels of the term tree eg. Hominids or Mammals would consolidate claims for species further out on the tree.

  • Supporting Quotes: Research Analysis requires that there be a quoted statement taken from the literature that supports each scientific claim in the database. While the whole research paper is required to fully support a claim, the quoted statements provide some context to the claim in the words of the scientist. It also makes it easier for other scientists to identify the section of the paper that was used to create the formalised claim.
  • Referencing: Along with the Supporting Quote, Research Analysis requires that the PMID is provided for the paper that the Supporting Quote was taken from – www.ncbi.nlm.nih.gov/pubmed. This allows users to quickly dive into the paper for more information or context by clicking on the PMID link in Research Analysis.

How to capture claims using Research Analysis?

  • Claims to Capture: A research paper may introduce one new claim, but generally presents a collection of claims that support a central claim of the paper. The user can choose to add just the highest level claim of the paper or drill down and capture the supporting claims too. It is preferred that claims are entered only for papers that provide new evidence or verification of the claim and not for papers that simply refer to the findings of other papers. Ideally you would drill down into the paper referenced and enter the claim with a quote from this paper. For this reason we prefer that claims are entered from original research papers or papers that verify previous work, rather than from review papers.
  • Adding Claims: You can add claims to the database via two means:
  1. Add Claim on View Claims page: To enter a claim on this page you simply need to enter the values into each of the boxes at the top of the table and then press the “Add Claim” button. The nice thing about this method is that it simultaneously searches the database to see if the claim already exists and you’ll see related claims as you enter each of the terms.
  2. Upload a Spreadsheet of Claims: To load a batch of claims you can go to the Upload Claims menu option and select and upload a spreadsheet containing claims. The spreadsheet will need to be in the right format and you can learn more in this article.
  • Edit/Delete Detailed Claim View: To edit, delete or see more details relating to a claim you can click the View link in the View Claims table and go to the claim page. Where the claim is supported by several Quoted Statements and papers you can see a listing of these on the claim page. You can also get a unique link to the claim page that makes it easy to share the claim with other users of Research Analysis.

We hope that this helps you get started with Research Analysis, but please feel free to contact us if you have further questions.

How to Add a Claim?

You can add claims to the database via two means:

  1. Add Claim on View Claims page: To enter a claim on this page you simply need to enter the values into each of the boxes at the top of the table and then press the “Add Claim” button. The nice thing about this method is that it simultaneously searches the database to see if the claim already exists and you’ll see related claims as you enter each of the terms.
  2. Upload a Spreadsheet of Claims: To load a batch of claims you can go to the Upload Claims menu option and select and upload a spreadsheet containing claims. The spreadsheet will need to be in the right format and you can learn more in this article.

How to reference Claims?

  • Supporting Quotes: Research Analysis requires that there be a quoted statement taken from the literature that supports each scientific claim in the database. While the whole research paper is required to fully support a claim, the quoted statements provide some context to the claim in the words of the scientist. It also makes it easier for other scientists to identify the section of the paper that was used to create the formalised claim.
  • Referencing: Along with the Supporting Quote, Research Analysis requires that the PMID is provided for the paper that the Supporting Quote was taken from. This allows users to quickly dive into the paper for more information or context by clicking on the PMID link in Research Analysis. We also provide the option to provide a detailed citation.

Standard Terms in Research Analysis

Wherever possible, claims should use standardised language for the elements of the claims. Research Analysis currently requires that all terms should follow the Medical Subject Headings (MeSH) standard. Elements of claims should either exist in MeSH as Subject Headings or any of the Entry Terms for a heading. For example, Humans is a Subject Heading in MeSH, but you can also use the Entry Terms which include Human, Homo sapiens, Man. We do allow for new terms to be added where there is no satisfactory MeSH term, however the use of MeSH terms allows for much more powerful search and analysis. For example, a scientist in one field may use a different term to those in another field, but using MeSH terms we can link these two claims together and provide cross-field visibility. In future we also hope to provide searching and analysis across different levels of the MeSH tree eg. Hominids or Mammals would consolidate claims for species further out on the tree.

What are Research or Scientific Claims?

The Research Analysis platform is centred around the concept of the scientific claim. The goal of a claim is to take a hypothesis that has been argued in the literature and state it in a formalised language that allows for easier comparison, searching and analysis.

While there are several types of scientific claim, Research Analysis is currently focused on Cause-Effect type scientific claims eg. Statins decrease Cholesterol in the Blood of Humans. We have broken up the components of Cause-Effect claims and formalised their language and structure to make them easier to compare, search and analyse. The easiest way to see how they work is to look through some examples in the View Claims table, but here are some details:

  • Claim Elements:
    1. Treatment/Cause: This is the drug, environmental, genetic, etc cause that is made in the model system. eg. Statin treatment
    2. Effect: This is the type of effect that results from the Treatment/Cause. Currently we only offer three options: increases, decreases or not significant.
    3. Molecule/Disease: This describes the molecule or disease that is effected by the cause. eg. Cholesterol
    4. Organ Model: This describes the organ of the animal or the in vitro system where the cause-effect relationship was observed. eg. Blood
    5. Genetic Model: This describes the genetic model in which the cause-effect relationship was observed. eg. ApoE -/-, Wild Type.
    6. Species: This describes the species in which the cause-effect relationship was observed. eg. Human.
  • Standard Terms: Wherever possible, claims should use standardised language for the elements of the claims. Research Analysis currently requires that all terms should follow the Medical Subject Headings (MeSH) standard. Elements of claims should either exist in MeSH as Subject Headings or any of the Entry Terms for a heading. For example, Humans is a Subject Heading in MeSH, but you can also use the Entry Terms which include Human, Homo sapiens, Man. We do allow for new terms to be added where there is no satisfactory MeSH term, however the use of MeSH terms allows for much more powerful search and analysis. For example, a scientist in one field may use a different term to those in another field, but using MeSH terms we can link these two claims together and provide cross-field visibility. In future we also hope to provide searching and analysis across different levels of the MeSH tree eg. Hominids or Mammals would consolidate claims for species further out on the tree.
  • Supporting Quotes: Research Analysis requires that there be a quoted statement taken from the literature that supports each scientific claim in the database. While the whole research paper is required to fully support a claim, the quoted statements provide some context to the claim in the words of the scientist. It also makes it easier for other scientists to identify the section of the paper that was used to create the formalised claim.
  • Referencing: Along with the Supporting Quote, Research Analysis requires that the PMID is provided for the paper that the Supporting Quote was taken from. This allows users to quickly dive into the paper for more information or context by clicking on the PMID link in Research Analysis. We also provide the option to provide a detailed citation.

How to upload claims – spreadsheet upload

Research Analysis allows researchers to upload claims in bulk using a spreadsheet via the following steps:

  1. Select the “Upload Claims” menu option.
  2. Click the “Select files…” button
  3. Select the spreadsheet in the correct format (refer below) that you wish to upload.
  4. Check if there were errors with the upload. If there are errors, correct these and reload the file (refer below). You can reload the whole file as duplicates will be ignored.

Formatting the spreadsheet

  1. The spreadsheet must be in the format required for Research Analysis. The easiest way to do this is to use the template spreadsheet provided via the below link. Don’t change any of the column or tab headings or the upload will fail.

Template Causal Claims – 20150909

Alternatively you can create an example spreadsheet by making a selection of claims on the View Claims page and then Export as Excel. The downloaded Excel file will be in the correct format for upload to Research Analysis.

Correcting Errors in the Spreadsheet

The following provides some descriptions of common errors:

  1. “Claim [Claim row number] was REJECTED due to a missing [Column Name] value” : The columns Treatment or GENE-KO, Increases Molecule/Disease, In Organ/Cell, Genetic Model, Species, Statement (Quote from paper) and PMID are required fields. If for example there was no Genetic Model specified for a claim then you should put “Not Specified” in this column or if “Wild Type” then enter that value in the column.
  2. “Sheet: [Sheet Name]. Col: [Column] Incorrect columns found”: This error means that the sheet/tab specified has the incorrect column names. If this is a sheet without claims then this is not a problem and can be ignored. If it does have claims then the columns must be fixed.