Create mirrors of bioinformatics sources
Having a local copies of bioinformatic databases in a published/common/conventional location will save the time and energy of researchers needing access this data. It should save build time as clients won't have to reach out to the web.
other sources that need to be mirrored:
Notes from Jeremy Horst
These are the profile databases that I see as structural in nature, or otherwise highly relevant to anything we would do. SCOP ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/scop70_1.75.hhm.tar.gz PDB70 ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/pdb70_18Nov10.hhm.tar.gz These are required for building the profile HMMs in HHsearch: ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/nr70.tar.gz ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/nr90.tar.gz HHsearch can only be run on 64bit machines, so definitely no need to load these on any fp1-160. Others for Ram & Hong & others to consider are: PFAM SUPFAM TIGRFAM PANTHER COG KOG CATH
- 2010-12-22 migrated home directory to cosmos instead of nas4, created cronjob,
- Created download area
- Modified rsyncPDB.sh script
- Started initial download
- 2010-12-22 following errors seem to be related to poor nfs behavior on nas4 - moving home directory got rid of problems.
Need to investigate these rsync errors:
# grep rsync rsync.log rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(231) rsync: read error: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at io.c(515) rsync: connection unexpectedly closed (42276265 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(359) rsync: writefd_unbuffered failed to write 4092 bytes: phase "unknown" [generator]: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at io.c(909) rsync error: received SIGUSR1 or SIGINT (code 20) at main.c(965)
This project, for being so seemingly simply, has been amazingly difficult to wrap up because of continued issues with nas4. nas4 is also being used in cluster jobs, so performance is miserable.
I've moved the PDB home directory to a different NFS server and will start the sync process again.