New TOPSAN Paper and Download Details

A paper on TOPSAN, entitled: “TOPSAN: a dynamic web database for structural genomics” will be featured in the Nucleic Acids Research upcoming database issue.
One of the main points of this paper is the many efforts to make TOPSAN data more accessible. These efforts include providing TOPSAN articles for bulk download in machine readable formats. These download options now include RDFa, RDF and N3 files. All of these bulk download options can be found at http://topsan.org/Downloads.

As a brief introduction, the three formats we have provided contain semantic web related data, which better enables data organization and easier machine parsing. The three formats include:

  • RDFa: The text extract of a page, with semantic web microtags embedded in the XHTML. Use this download if you want the full text of the articles.
  • RDF: A pure XML description of relationship triples that can be used to create a searchable database. Use this download if you want to build an semantic database with graph context.
  • N3: A simpler format to describe semantic web triples. Use this download if you want the easiest to read version of TOPSAN triples.

Every article on TOPSAN is identified with a unique alias. For example, the record on the PDB protein 2ASH, found at http://topsan.org/Proteins/JCSG/2ash, is identified as TPS1300. The alias ID for topsan records begins with the prefix TPS and is followed by a unique number. You can find this ID in the ‘Alias Ids’ field of the page header. This identifier can be used to download the ‘light’ versions of TOPSAN pages. So for the TPS1300 example, the RDFa record can be found at http://topsan.org/rdfa/TPS1300, while the RDF record can be found at http://topsan.org/rdf/TPS1300.

The bulk download of all RDFa files in a single XML file can be found at http://files.topsan.org/topsan.xml.gz. The full RDF extract of TOPSAN be found at http://files.topsan.org/topsan_rdf.tar.gz.  This tarball contains every TOPSAN page entry as a seperate RDF file, as well as a file called ‘graphMap’ that can be used to map the context of the graphs on a quad store server.

We hope that by providing TOPSAN articles and knowledge in these formats we will enable better collaboration with other protein annotation efforts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: