TOPSAN and the Semantic Web (Part I)

This is the first in a series of blogs in which we will try to introduce you to the concepts behind the TOPSAN Protein Syntax and the TOPSAN semantic notation system. The first article will be the basics of engaging the notation environment and some simple examples of how to use the notation system. Next we will describe how to use obtain and use the semantic information that has been embedded in TOPSAN, compose queries and analyze the available information. Finally we will describe the more advanced concepts involved with the controlled ontology of predicates that the TOPSAN Protein Syntax describes.

To get read more about the technologies involved you can find additional information at:

What is the semantic web:

Biohackathon semantic web series:

How the data is being stored:

How ontologies can control the language used to describe the relationships between different pieces of data:
Technologies that can be used to query and examine the data:

Introduction to TOPSAN Protein Syntax

One of the goals of the TOPSAN protein annotation system is to make sure that human annotations of protein structures are available to the public. This includes ensuring that annotations are available in a machine readable format. When an annotator adds a link or a value to a page, it is important the intent for this link is expressed. It is important to know if a link has been added because it’s an example of a homologue or because it is a link to another protein in the same pathway. The concepts and standards behind the semantic web provide a framework for expressing this information.
The TOPSAN Protein Syntax (TPS), is designed to cover the set of predicates used to describe the relationships between proteins and the databases and values they can be linked to. These predicates follow a formalized ontology, that begins from three different roots. These different branches represent the different basic concepts that are used to describe proteins. These include ‘links’, and ‘values’. ‘Link’ statements describe connections from a protein to another database element while ‘value’ statements assign direct values and data to a protein.

All calls embedded in the text of TOPSAN documents begin and end with the double brackets ‘{{‘ and ‘}}’. You then make a call to ‘note.link’. There are two ways to call the function, via sequential argument or by named arguments. For sequential arguments, wrap the arguments with ‘(‘ and ‘)’, and type in the values. This is usually only used for the two argument call, when passing the predicate and the object values. Alternatively, if you want to manipulate additional arguments the named argument format is preferred. In this method, the arguments are wrapped with ‘{‘ and ‘}’, and the name of each of argument is given followed by a ‘:’ and then the value. When using the named argument format you don’t have to remember a specific order of arguments.

note.link Arguments:

  • rel : Type of relationship
  • value : The database to link to, if it is not an identifiable link it is assigned as a literal value
  • visible : If false the call does not produce text that is visible on the page
  • about : Defaults to the current page
  • rev: If the relationship is reversed, so that the destination is the subject and the current page is the object.

Relationships can go in both directions. By default the subject is the current page and the passed value is the object. To reverse this relationship, so that the relationship statement is about the external database pointing to the current page, set ‘rev:true’.

Examples:

Embed a link to PFAM:

{{ note.link( ‘memberOf’, ‘PFAM:PF07980′ ) }}

Cite a PubMed Reference:
{{ note.link( ‘citation’, ‘PMID:19191477′ ) }}

Reverse a relationship:

{{ note.link{ rel:’similar’, value:’UNIPROT:Q8A1G2′, rev:true } }}
Define a relationship about something other then the current page:
{{ note.link{ about:’TOPSAN:2aam’, rel:’similar’, value:’UNIPROT:Q8A1G2′ } }}
On the editor this would like:
When displayed in the page it would be:

Predicates

We will describe the set of TOPSAN Protein Syntax predicates with greater detail later. For now, there are only a handful of predicates that you need to know in order to get started.
Predicate
Definition
similar
Represents a connection between two proteins that are homologous or structurally similar
classifiedWith
Connects a protein to an assigned function type
memberOf
Connects a single element to group to which it is a part of
citation
A connection to a literature citation


Available Databases

When describing a link to another database, you can use prefix codes that will recognized by TOPSAN and translated accordingly. We have a set of 10 linked databases currently, but this will grow as needed. To use the prefix code, simply name the database by code, followed by a colon and the database identified from the database, ie “PFAM:PF0798″, “UNIPROT:Q8A1G2″, or “PDB:1AAC”.

Prefix Database
GO The Gene Ontology database
PFAM The Pfam protein family database
UNIPROT The Uniprot protein database
EC The Enzyme Catalogue
TOPSAN The TOPSAN protein annotation system
TAXON The NCBI taxonomic codes
PDB The Protein Data Base
PMID Pubmed
SCOP Scop domain IDs, ie d1wy7a1
SUNID Scop ID: ie, 51349 -> Alpha and beta proteins (a/b)

Data Mining on TOPSAN

Part II

2 Responses to TOPSAN and the Semantic Web (Part I)

  1. [...] a brief introduction, the three formats we have provided contain semantic web related data, which better enables data organization and easier machine parsing. The three formats [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: