
BIOE 598SZ Computation in Bioengineering and Systems Biology
Course objective:
Introduce/review mathematical and computational methodologies frequently used in
the analysis of biological data. Introduce/review computational environments and
software tools that are prominent in Bioengineering/Bioinformatics applications
and especially in analyzing large scale genomics data and modeling biological
systems. Practice programming and computation on biological objects.
Text book:
Bioinformatics and Computational Biology Solutions Using R
and Bioconductor. Springer 2005.
Logistics:
Meeting Time: Fall 2006, 11:00am – 12:50pm, Mon Wed
Meeting place: 1245 Digital Computer Lab
Credits: 4 graduate hours. Required for all bioengineering PhDs.
Course Reference number: CRN 47185
Instructor: Sheng Zhong (szhong AT uiuc DOT edu)
Enrollment: 20
Prerequisites: BIOE598MI, Analytical Methods for Biological System Modeling (can be taken in parallel); or consent of instructor.
URL: http://bioinformatics.bioen.uiuc.edu/bioe598
Evaluation:
Course grade is based on homework (25%), in class presentation (25%), review essay (25%) and final project (25%).
An example paper for the project paper.
Datasets for project:
1. Colon cancer: Colon adenocarcinoma vs. matched normal tissue.
2. Early development: from the oocyte to a small embryo. Gene ID transformation file, Paper1, Paper2
Contents:
1. Computing environments
a. Review of operating systems, networking basics
b. Computational tools (R/Matlab/C++). Installation, getting help, first program.
2. Commonly encountered data in bioengineering
a. Image data
b. Genome sequence, microarray, proteomics data
c. Continuous signals
d. Graph and network
3. Data preprocessing
a. Input, Output; graphical display
b. An error model
c. Linking from/to online resources
4. Linear models
a. Model assumptions, biological and technical replication
b. Group comparison
c. Factorial design
d. Error structure, model checking
e. Application: Detecting hyper-active genes in breast cancer cells.
5. Classification
a. Distance measures, dissimilarity matrix, objectives
b. Probability theory
c. Linear methods
d. Bootstrap
e. Evaluation
f. Application: Classification of two type of acute leukemia patients, Acute Lymphocytic Leukemia (ALL) and Acute Myelogenous Leukemia (AML)
6. Clustering
a. Hierarchical clustering
b. Visualizing clustering results
c. Statistical issues: number of clusters, quality assessment
d. Model based clustering
7. Computational issues and treatments:
a. Parameter estimation, sampling from given distribution
b. E-M (Expectation Maximization) algorithm. Link between K-means and Model based clustering
c. Monte Carlo method and Gibbs Sampling
d. Application: Detection of novel gene expression patterns after estrogen receptor stimulation
8. Networking
a. HTML, XML, Simple Object Access Protocol (SOAP), Common Gateway Interface (CGI)
b. Networking with National Center for Biotechnology Information (NCBI)
c. Networking with PubMed
d. Networking with a signaling pathway database