CS 251: Syllabus

Syllabus for Spring 2009

Topics and Reading Assignments

Textbooks

Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2005.


Grading

Weekly Assignments 45%
Exams (2) 30%
Final Exam 20%
Class Participation 5%

This course covers the analysis and visualization of scientific data. Topics will include data management, basic statistical analysis, data mining techniques, and the fundamental concepts of machine learning. Students will also learn how to visualize data using 2-D and 3-D graphics, focusing on techniques that highlight patterns and relationships. Course projects will use data from active research projects at Colby.


Late Policy:

The weekly assignments will build upon each other, and each week a solution for the prior week's assignment will be posted so that everyone begins each week with a working code base. Assignments turned in after the solutions have been posted will receive no credit. It is better to hand in a partially working assignment than nothing at all.


Daily Topics and Readings

WeekTopicsReading
1:
  • Introduction, course concept
  • What is visualization?
  • What is data?
2:
  • Coordinate systems
  • Plots, graphs, charts
  • Data transformations
3:
  • Patterns in data
  • Basic statistics
  • Linear Regression
4:
  • 3D Visualization
  • Viewing pipeline
  • Camera control
5:
  • Visualizing higher dimensions
  • Visualizing time series
  • Exam 1
6:
  • Principle Components Analysis (PCA)
  • Complex objects as data
  • Visualizing images, music, files
7:
  • What is Data Mining?
  • Fundamentals of pattern recognition
  • Process of learning from data

Spring Break
8:
  • Overview of Machine Learning methods
  • Naive Bayes
  • 1R Decision Trees
9:
  • Decision Trees
  • Issues in Building Trees
  • Using Weka
10:
  • Artificial Neural Networks
  • Neural Network Variations
  • SNNS
11:
  • Clustering and EM algorithm
  • Typology development
  • Exam 2
12:
  • Bioinformatics: string matching
  • Genetic Algorithms
13:
  • Meta-learning
  • Boosting, bagging, and forests
  • Logistic regression