Create mirrors of bioinformatics sources

From CompBio
Jump to: navigation, search

Contents

Introduction

Having a local copies of bioinformatic databases in a published/common/conventional location will save the time and energy of researchers needing access this data. It should save build time as clients won't have to reach out to the web.

Tasks

Outstanding

other sources that need to be mirrored:

Astral

Culled

Notes from Jeremy Horst

These are the profile databases that I see as structural in nature, or
otherwise highly relevant to anything we would do. 

SCOP
ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/scop70_1.75.hhm.tar.gz

PDB70
ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/pdb70_18Nov10.hhm.tar.gz


These are required for building the profile HMMs in HHsearch:
ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/nr70.tar.gz
ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/nr90.tar.gz

HHsearch can only be run on 64bit machines, so definitely no need to load these on any fp1-160.

Others for Ram & Hong & others to consider are:
PFAM
SUPFAM
TIGRFAM
PANTHER
COG
KOG
CATH

Completed

pdb

  • 2010-12-22 migrated home directory to cosmos instead of nas4, created cronjob,
  • Created download area
  • Modified rsyncPDB.sh script
  • Started initial download
  • 2010-12-22 following errors seem to be related to poor nfs behavior on nas4 - moving home directory got rid of problems.


The source data is downloaded from http://www.pdb.org/pdb/download/download.do The script to sync the data was provided by pdb.org as ftp://ftp.wwpdb.org/pub/pdb/software/rsyncPDB.sh

Need to investigate these rsync errors:

# grep rsync rsync.log 
rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(231)
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(515)
rsync: connection unexpectedly closed (42276265 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
rsync: writefd_unbuffered failed to write 4092 bytes: phase "unknown" [generator]: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(909)
rsync error: received SIGUSR1 or SIGINT (code 20) at main.c(965)


  • 2010-12-20

This project, for being so seemingly simply, has been amazingly difficult to wrap up because of continued issues with nas4. nas4 is also being used in cluster jobs, so performance is miserable.

I've moved the PDB home directory to a different NFS server and will start the sync process again.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox