This is the first in a series of blogs in which we will try to introduce you to the concepts behind the TOPSAN Protein Syntax and the TOPSAN semantic notation system. The first article will be the basics of engaging the notation environment and some simple examples of how to use the notation system. Next we will describe how to use obtain and use the semantic information that has been embedded in TOPSAN, compose queries and analyze the available information. Finally we will describe the more advanced concepts involved with the controlled ontology of predicates that the TOPSAN Protein Syntax describes.
To get read more about the technologies involved you can find additional information at:
What is the semantic web:
Biohackathon semantic web series:
How the data is being stored:
How ontologies can control the language used to describe the relationships between different pieces of data:
Technologies that can be used to query and examine the data:
Introduction to TOPSAN Protein Syntax
One of the goals of the TOPSAN protein annotation system is to make sure that human annotations of protein structures are available to the public. This includes ensuring that annotations are available in a machine readable format. When an annotator adds a link or a value to a page, it is important the intent for this link is expressed. It is important to know if a link has been added because it’s an example of a homologue or because it is a link to another protein in the same pathway. The concepts and standards behind the semantic web provide a framework for expressing this information.
The TOPSAN Protein Syntax (TPS), is designed to cover the set of predicates used to describe the relationships between proteins and the databases and values they can be linked to. These predicates follow a formalized ontology, that begins from three different roots. These different branches represent the different basic concepts that are used to describe proteins. These include ‘links’, and ‘values’. ‘Link’ statements describe connections from a protein to another database element while ‘value’ statements assign direct values and data to a protein.
All calls embedded in the text of TOPSAN documents begin and end with the double brackets ‘{{‘ and ‘}}’. You then make a call to ‘note.link’. There are two ways to call the function, via sequential argument or by named arguments. For sequential arguments, wrap the arguments with ‘(‘ and ‘)’, and type in the values. This is usually only used for the two argument call, when passing the predicate and the object values. Alternatively, if you want to manipulate additional arguments the named argument format is preferred. In this method, the arguments are wrapped with ‘{‘ and ‘}’, and the name of each of argument is given followed by a ‘:’ and then the value. When using the named argument format you don’t have to remember a specific order of arguments.
note.link Arguments:
- rel : Type of relationship
- value : The database to link to, if it is not an identifiable link it is assigned as a literal value
- visible : If false the call does not produce text that is visible on the page
- about : Defaults to the current page
- rev: If the relationship is reversed, so that the destination is the subject and the current page is the object.
Relationships can go in both directions. By default the subject is the current page and the passed value is the object. To reverse this relationship, so that the relationship statement is about the external database pointing to the current page, set ‘rev:true’.
Examples:
Embed a link to PFAM:
{{ note.link( ‘memberOf’, ‘PFAM:PF07980’ ) }}
Cite a PubMed Reference:
{{ note.link( ‘citation’, ‘PMID:19191477’ ) }}
Reverse a relationship:
{{ note.link{ rel:’similar’, value:’UNIPROT:Q8A1G2′, rev:true } }}
Define a relationship about something other then the current page:
{{ note.link{ about:’TOPSAN:2aam’, rel:’similar’, value:’UNIPROT:Q8A1G2′ } }}
On the editor this would like:
When displayed in the page it would be:
Predicates
We will describe the set of TOPSAN Protein Syntax predicates with greater detail later. For now, there are only a handful of predicates that you need to know in order to get started.
Predicate
|
Definition
|
similar
|
Represents a connection between two proteins that are homologous or structurally similar
|
classifiedWith
|
Connects a protein to an assigned function type |
memberOf
|
Connects a single element to group to which it is a part of
|
citation
|
A connection to a literature citation
|
Available Databases
When describing a link to another database, you can use prefix codes that will recognized by TOPSAN and translated accordingly. We have a set of 10 linked databases currently, but this will grow as needed. To use the prefix code, simply name the database by code, followed by a colon and the database identified from the database, ie “PFAM:PF0798”, “UNIPROT:Q8A1G2”, or “PDB:1AAC”.
Prefix |
Database |
GO |
The Gene Ontology database |
PFAM |
The Pfam protein family database |
UNIPROT |
The Uniprot protein database |
EC |
The Enzyme Catalogue |
TOPSAN |
The TOPSAN protein annotation system |
TAXON |
The NCBI taxonomic codes |
PDB |
The Protein Data Base |
PMID |
Pubmed |
SCOP |
Scop domain IDs, ie d1wy7a1 |
SUNID |
Scop ID: ie, 51349 -> Alpha and beta proteins (a/b) |
Data Mining on TOPSAN