|
We can support you, comprehensively, with your advanced IT needs. To see how,
consider typical research scenarios in Health Care and related sciences.
We can then show you, step-by-step, how we can provide value add
to accelerate/improve your workflow.
Life Sciences IT Needs
Life Sciences computing activities can be broadly classified into two basic
areas:
- Analysis of data (from medical sensors, gene expression microarrays, etc.)
- Simulations (molecular dynamics, protein folding, etc.)
For this discussion, let us consider data storage, movement, archival, and
publishing, etc. as important but subsidiary activities.
Let us consider typical scenarios that you likely encounter in your work.
Analysis of data
A research project that uses clinical data might be
described roughly as follows:
- You create many, large images (as measured in Megabytes) using a medical
sensor (say X-ray). (You can replace images with almost
any kind of sensor or other data.)
- You currently store and analyze them in house, using your own storage and
computing systems and applications that you have written yourself or have
purchased.
- You go through multiple, time comsuming analysis steps that use
various internal and external inputs to iterate to an acceptable outcome.
- You compile the results and archive raw/processed data created in
intermediate analysis steps. Your want your data to be available
for the next 10 years (for regulatory or other reasons).
- You publish the results in a paper and make presentations at conferences.
- You make the data or results available via the web to the community
at large (because of the NIH grant that supports your project includes
an explicit, data dissemination component).
- You want the time between data acquisition and results to be as small as
possible but do not have the resources in house to make this happen.
- You will consider using external resources to speed up the process *IF*
you have assurance of data security and management.
Your entire workflow looks something like this:
a) Do preliminary investigation
b) Write grant proposal
c) Get grant proposal funded
d) Develop analysis application(s) & data dictionaries
e) Obtain data from sensor(s)
f) Move data to working (hard disk) storage
g) Tag it for easy retrieval down the road
h) Analyze it locally
i) Visualize data during or at end of workflow
j) Use intermediate data used for further analysis
k) Compile results
l) Publish results in paper(s)
m) Tag and archive raw and/or derived data
n) Place data in a database
o) Publish data or results to the world at large
p) Present data at conferences & outreach events (local kids, e.g.)
q) Write the next grant proposal
r) Possibly retrieve data 10 years hence for reanalysis
Computing steps in this workflow can be described generally in terms of
a "data lifecycle" as follows:
Data creation -> Data storage -> Data analysis -> Data publishing -> Data
archival/disposal
Simulations
Say you have a set of simulations that run in series as follows:
- Molecular dynamics to identify likely candidates that have properties that
make them useful in treating some disease.
- Molecular docking calculations with the disease causing agent to
determine if the canadidate molecule will bind effectively.
- A massive search through national and local databases to identify
the right compound.
The workflow here is generally similar to the one for analyzing data,
with a few differences:
a) Do preliminary investigation
b) Write grant proposal
c) Get grant proposal funded
d) Develop/modify application(s)
e) Prepare input(s)
f) Move data to working (hard disk) storage
g) Run simulation(s)
h) Visualize results during or at the end of workflow
i) Feed the output of simulation to the next simulation
j) Go back to e
k) Compile results
l) Publish results in paper(s)
m) Tag and archive resulting data
n) Place data in a database
o) Publish data or results to the world at large
p) Present data at conferences & outreach events (local kids, e.g.)
q) Write the next grant proposal
r) Possibly retrieve data 10 years hence for reanalysis
How we can help with each, step-by-step
- We can support your computing needs during your preliminary investigation
for no or very low cost (when you do not have the financial resources
for in-house computing).
- We can design and develop applications
- We can help develop algorithms
- We can store working data in a central location and make it visible
from computers across the lab, campus, or continent.
- We can plan how best to tag and verify data & provide consulting
on metadata & provenance.
- We can help determine if it's possible and appropriate to migrate your
analysis application(s) to our supercomputers. This might a) result in
significant speedups (say 10X) in obtaining results, and b) help offset costs
that you would normally incur in acquiring and supporting your local
computing infrastructure (people, equipment, software, maintenance).
- We can migrate your application/workflow to supercomputers.
- We can determine if your application or your workflow lends itself
to using national grids (of linked compute, storage, and visualization
resources).
- We can determine if it's possible using our infrastructure
to speed up the workflow such that on-the-fly analysis can be used to
make decisions in real time about how to proceed.
- We can speed up your application further by parallelizing it
(coding it such that multiple analysis steps can run concurrently on
hundreds to thousands of processors that commonly constitute a modern
supercomputer).
- We can supply optimized (parallel) versions (if
available) of commonly used tools such as BLAST, etc.
- We can replicate publicly available, remote data locally to make it
easier/faster for you to query/use it.
- We can visualize data on our, powerful visualization systems
both to assist your analysis and/or deicision making. We can also help
in local and/or national outreach activities with attractive visuals.
- We can store your data on our massive Oracle databases and
to publish it via the web.
- We can provide you with a robust, 24x7 application hosting environment
on our servers.
- We can archive data on our massive data storage system
for posterity. Once stored, it is our responsibility for making your data
available at all times despite migrations to future technologies,
etc. Retrieval in 10, 20, 30 years is thus not an issue.
- We can protect your data in case of a local disaster by replicating
the data between Indianapolis and Bloomington (in near real time).
- We can archive ALL your data in a SINGLE location. Gone
will be the need to store disparate archival media, namely several generations
of hard-to-read or unreadable tapes, CD/DVD, hard disks, etc., in your lab.
- We can plan for future growth of your data storage, analysis,
and dissemination needs.
- We are the only group on campus that can do all of these for projects
with modest needs all the way up to those operating at extremely large scales
(Terabytes of data and database storage, TeraFLOPS of computing capacity,
etc.).
- We can help you with your advanced IT needs when you write a new grant
proposal through active partnering as a co-investigator.
|