BINF G4000 Acculturation to Programming and Statistics
Course Description: This course is targeted for biomedical scientists looking for working knowledge of programming and statistics. This is a fast-paced, hands-on course covering the following topics: programming basics in Python, probabilities, elements of linear algebra, elements of calculus, and elements of data analytics. Students are expected to learn lecture material outside of the classroom and focus on labs during class. All labs evolve around real-world biomedical and health datasets. Only open to DBMI enrolled students in our MA or PhD program. BINF G4000 must be taken fall term of entry. Instructor provides placement exam on first day of class. Students may test out of the course based on placement exam results.
Instructor

Karthik Natarajan, PhD
Class Schedule
This class meets twice weekly on Tuesdays and Thursdays.
Class Structure
I. Programming Basics I (2.5 weeks)
• Computing environment for biomedical sciences (Linux operating system, shell commands)
• IDEs (Integrated Development Environments) and text editors for Python (Emacs, Eclipse)
• Python variables, functions, basic data structures, libraries, numpy, files
Class | Topic | Objectives and Competencies | Themes |
---|---|---|---|
00 | Introduction to G4000 | Overview of course | |
01 | Assessment Test | Assessment Test | |
02 | Introduction to Linux | VirtualBox setup, deploying lubuntu, basic commands | 1. ls 2. chmod 3. sudo |
03 | Introduction to Regex | grep, sed | 1. backreference 2. commands 3. UMLS |
04 | Development environments, programming primitives | Emacs, Python, variables, conditionals, loops, lists | 1. keybindings 2. filtering and lists 3. early termination |
05 | Abstraction of code and data, code reuse | libraries, functions, data structures, files | 1. OS standard library 2. lists and tuples 3. recursion |
06 | Vectorized code and visualization | Vectorized operations and efficiency considerations, reading and writing files, line plots, histograms |
II. Probabilities (2 weeks)
• Axioms, PMFs, PDFs, CDFs
• Distribution families (e.g., Binomial, Multinomial, Normal)
• Sampling
• Estimation
• Plotting
• http://research.cs.tamu.edu/prism/lectures/sp/l10.pdf
• http://www.math.uiuc.edu/~kkirkpat/SampleSpace.pdf
Class | Topic | Objectives and competencies | Themes |
---|---|---|---|
07 | Introduction to probability theory | Axioms, conditional probability law of total probability, sample spaces | 1. Sample spaces and events 2. Probability axioms 3. Chain rule of probability |
08 | Probability distributions | Probability density functions, Probability mass functions, Cumulative distribution functions, mean, variance | |
09 | Random sampling and estimation | Sampling, expected value, MLE, bootstrapping | |
10 | Bayesian probability | Cox’s Theorem, Bayes theorem, interpretations of probability, MAP | 1. Cox’s theorem 2. Bayes’ theorem 3. MAP |
III. Programming Basics II (1 week)
• Data structures (dictionaries/hash-maps, sets)
• Persistence (reading and writing delimited and JSON files)
Class | Topic | Objectives and Competencies |
---|---|---|
11 | Data structures: dictionaries/hash-maps and sets | Data structure performance characteristics and choice |
12 | Midterm Review |
IV. Elements of Linear Algebra (1 week)
• Scalars, vectors, matrices
• Dot product, matrix multiplication
• Plotting
Class | Topic | Objectives and Competencies |
---|---|---|
13 | Concepts from Linear Algebra | Vectors, matrices, inner product |
14 | Multidimensional randomness | Random vectors, covariance, multivariate normal distribution |
V. Programming Basics III (2 week)
• Persistence (relational database rationale and basic operations)
Class | Topic | Objectives and Competencies | Themes |
---|---|---|---|
15 | Relational databases and basic operations: Create, Read, Update, and Delete (CRUD) | schema, primary keys, group by | 1. LIKE 2. CONCATENATE 3. Functions |
16 | Database modeling with multiple tables | join, indexes | 1. subqueries 2. outer join 3. multi-column index |
17 | OHDSI | ||
18 | Git | version control, git |
VI. Programming Basics IV (1 week)
• Object oriented programming
• Handling large datasets (data frames, pandas)
Class | Topic | Objectives and Competencies | Themes |
---|---|---|---|
19 | Object¬-oriented programming (OOP) | Classes, objects, and inheritance | |
20 | Data, Persistence | Data frames, null values, filtering, strengths and weaknesses of file formats | 1. JSON 2. XML 3. CSV 4. Serialization |
VII. Elements of Data Analytics (1.5 week)
• Hypothesis testing (scipy library, chi-square, t-test, one-way ANOVA, correlation)
• Predicting from large datasets (logistic regression, convex optimization)
Class | Topic | Objectives and Competencies |
---|---|---|
21 | Hypothesis testing theory | null hypothesis, p-value, confidence interval, credible interval |
22 | Hypothesis testing practice | chi-squared, t-test, ANOVA, Pearson correlation, non¬parametric tests |
23 | Prediction | regression, least squares, ML |
VIII. Review and Final Exam