CS 251: Syllabus

Syllabus for Spring 2010

Topics and Reading Assignments

Textbooks

Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2005.


Grading

Weekly Assignments 45%
Exams (2) 30%
Final Exam 20%
Class Participation 5%

This course covers the analysis and visualization of scientific data. Topics will include data management, basic statistical analysis, data mining techniques, and the fundamental concepts of machine learning. Students will also learn how to visualize data using 2-D and 3-D graphics, focusing on techniques that highlight patterns and relationships. Course projects will use data from active research projects at Colby.


Late Policy:

The weekly assignments will build upon each other, and each week a solution for the prior week's assignment will be posted so that everyone begins each week with a working code base. Assignments turned in after the solutions have been posted will receive no credit. It is better to hand in a partially working assignment than nothing at all.


Daily Topics and Readings

WeekTopicsReading
1:
  • Introduction, course concept
  • What is data?
  • GUI development in Python
Tkinter tutorials
2:
  • Interactive visualization
  • Coordinate systems and transformations
  • NumPy: working with matrices
Numpy tutorials
3:
  • 2D and 3D viewing pipeline
  • Interactive camera control
  • Visualizing higher dimensions
Lecture notes
4:
  • Charts and histograms
  • Range selection
  • Data characterization
Lecture notes
5:
  • Data distributions
  • Data transformations
  • Linear regression
Handouts, Witten and Frank, chapter 7
6:
  • Principal component analysis [PCA]
  • Noise and variance
  • Exam 1
Handouts
7:
  • Fundamentals of pattern recognition
  • Concepts of learning and training
  • The No Free Lunch theorem
Witten and Frank, chapters 1-3

Spring Break
8:
  • Overview of machine learning methods
  • Distance metrics: what is similar?
  • Naive Bayes
Witten and Frank, chapter 4
9:
  • 1-R and full decision trees
  • Issues in building trees
  • Using Weka
Witten and Frank, chapters 4, 6
10:
  • Artificial Neural Networks
  • Neural Network Variations
  • SNNS
Handouts
11:
  • Linear regression
  • Linear regression variations
  • Exam 2
Witten and Frank, chapters 4, 6
12:
  • Robust regression
  • Clustering
  • Meta-learning: bagging, boosting, randomization
Witten and Frank, chapter 8, handouts
13:
  • Cascade classifiers
  • Bioinformatics
  • BLAST algorithm
Handouts