TOPSAN and the Semantic Web (Part I)

This is the first in a series of blogs in which we will try to introduce you to the concepts behind the TOPSAN Protein Syntax and the TOPSAN semantic notation system. The first article will be the basics of engaging the notation environment and some simple examples of how to use the notation system. Next we will describe how to use obtain and use the semantic information that has been embedded in TOPSAN, compose queries and analyze the available information. Finally we will describe the more advanced concepts involved with the controlled ontology of predicates that the TOPSAN Protein Syntax describes.

To get read more about the technologies involved you can find additional information at:

What is the semantic web:

Biohackathon semantic web series:

How the data is being stored:

RDFa Basics

How ontologies can control the language used to describe the relationships between different pieces of data:

Rules and Semantic Web

Technologies that can be used to query and examine the data:

Introduction to TOPSAN Protein Syntax

One of the goals of the TOPSAN protein annotation system is to make sure that human annotations of protein structures are available to the public. This includes ensuring that annotations are available in a machine readable format. When an annotator adds a link or a value to a page, it is important the intent for this link is expressed. It is important to know if a link has been added because it’s an example of a homologue or because it is a link to another protein in the same pathway. The concepts and standards behind the semantic web provide a framework for expressing this information.

The TOPSAN Protein Syntax (TPS), is designed to cover the set of predicates used to describe the relationships between proteins and the databases and values they can be linked to. These predicates follow a formalized ontology, that begins from three different roots. These different branches represent the different basic concepts that are used to describe proteins. These include ‘links’, and ‘values’. ‘Link’ statements describe connections from a protein to another database element while ‘value’ statements assign direct values and data to a protein.

All calls embedded in the text of TOPSAN documents begin and end with the double brackets ‘{{‘ and ‘}}’. You then make a call to ‘note.link’. There are two ways to call the function, via sequential argument or by named arguments. For sequential arguments, wrap the arguments with ‘(‘ and ‘)’, and type in the values. This is usually only used for the two argument call, when passing the predicate and the object values. Alternatively, if you want to manipulate additional arguments the named argument format is preferred. In this method, the arguments are wrapped with ‘{‘ and ‘}’, and the name of each of argument is given followed by a ‘:’ and then the value. When using the named argument format you don’t have to remember a specific order of arguments.

note.link Arguments:

rel : Type of relationship
value : The database to link to, if it is not an identifiable link it is assigned as a literal value
visible : If false the call does not produce text that is visible on the page
about : Defaults to the current page
rev: If the relationship is reversed, so that the destination is the subject and the current page is the object.

Relationships can go in both directions. By default the subject is the current page and the passed value is the object. To reverse this relationship, so that the relationship statement is about the external database pointing to the current page, set ‘rev:true’.

Examples:

Embed a link to PFAM:

{{ note.link( ‘memberOf’, ‘PFAM:PF07980’ ) }}

Cite a PubMed Reference:

{{ note.link( ‘citation’, ‘PMID:19191477’ ) }}

Reverse a relationship:

{{ note.link{ rel:’similar’, value:’UNIPROT:Q8A1G2′, rev:true } }}

Define a relationship about something other then the current page:

{{ note.link{ about:’TOPSAN:2aam’, rel:’similar’, value:’UNIPROT:Q8A1G2′ } }}

On the editor this would like:

When displayed in the page it would be:

Predicates

We will describe the set of TOPSAN Protein Syntax predicates with greater detail later. For now, there are only a handful of predicates that you need to know in order to get started.

Predicate	Definition
similar	Represents a connection between two proteins that are homologous or structurally similar
classifiedWith	Connects a protein to an assigned function type
memberOf	Connects a single element to group to which it is a part of
citation	A connection to a literature citation

Available Databases

When describing a link to another database, you can use prefix codes that will recognized by TOPSAN and translated accordingly. We have a set of 10 linked databases currently, but this will grow as needed. To use the prefix code, simply name the database by code, followed by a colon and the database identified from the database, ie “PFAM:PF0798”, “UNIPROT:Q8A1G2”, or “PDB:1AAC”.

Prefix	Database
GO	The Gene Ontology database
PFAM	The Pfam protein family database
UNIPROT	The Uniprot protein database
EC	The Enzyme Catalogue
TOPSAN	The TOPSAN protein annotation system
TAXON	The NCBI taxonomic codes
PDB	The Protein Data Base
PMID	Pubmed
SCOP	Scop domain IDs, ie d1wy7a1
SUNID	Scop ID: ie, 51349 -> Alpha and beta proteins (a/b)

Data Mining on TOPSAN

Part II

This entry was posted on Tuesday, June 1st, 2010 at 6:42 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to TOPSAN and the Semantic Web (Part I)

TOPSAN and the Semantic Web (Part II) « TOPSAN – The Network News says:

July 12, 2010 at 9:38 pm

[…] Part I […]

Reply
New TOPSAN Paper and Download Details « TOPSAN – The Network News says:

November 10, 2010 at 8:06 pm

[…] a brief introduction, the three formats we have provided contain semantic web related data, which better enables data organization and easier machine parsing. The three formats […]

Reply

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

	New TOPSAN Paper and… on TOPSAN and the Semantic Web (P…
	TOPSAN and the Seman… on TOPSAN and the Semantic Web (P…
	TOPSAN and the Seman… on TOPSAN and the Semantic Web (P…
	Andreas on Error propagation
	Mike on Betting on Science

TOPSAN – The Network News