Due Thursday, 31 March, 2011
The goal of this week's lab is to create visualizations and run analyses on two data sets from active research projects.
As with the last project, you may work with a partner, if you wish. Note that, if you work with a partner, the standards for extensions will be higher.
We have three data sets for visualization. You need to demonstrate your program on all three. You are free to substitute a data set from an active research project at Colby for one of them, if you wish. All three data sets are available on the Academics server in csv format in the CS251/Data folder.
Bird Arrivals in Maine: this is the data see we used for one of
the tasks on Project 3. The data tracks the first sighting by
volunteer observers of bird species in fifteen spatial regions in
Maine. The data has been collected by Prof. Herb Wilson in the
Dept. of Biology. It contains five columns.
- Year (numeric): year of the observation, 4 digits
- Species (string): the bird species
- BioPhyReg (enum): the region in Maine for which the sighting was reported
- Date (date): the date of the first sighting of the bird that year by one of the observers in the given region using m/d/y format with 2 digits for the year
- DOY (numeric): the day of the year (0 to 365) of the first sighting of the bird that year by one of the observers in the given region. Note that each region may have multiple observers, so there may be multiple first sightings within a region on any given year.
Eye Tracking Data: this is new data from Prof. Martha Arterberry
in the Department of Psychology. She is analyzing the visual
perception of objects and their surrounding context by examining eye
movements in infants and adults. The goal is to observe the
relationship between objects, their surrounding visual context, and
focus of attention.
In the experiment, participants looked at images of 18 objects--10 animals and 8 vehicles-in 3 different contexts. In all three contexts the objects were cut out of their original image and pasted onto a different background.
- In the first case, the background was incongruent with the object; vehicles were placed in nature settings and animals were placed in city or street settings.
- In the second case, the background was congruent with the object; vehicles were placed in city or street settings and animals were placed in nature settings.
- In the third case, the background was plain white.
Participants in the study had the opportunity to view each image and an eye tracking system followed their gaze. Prof. Arterberry and her students then used software to identify fixation locations and manually coded the locations as being in the object (F), in the background (G), outside the image (O), on the object side of the border (B/F), or on the background side of the border (B/G). Each line of the file corresponds to one fixation.
Overall, the data contains nine columns.
- SubjectID (enum): a randomly assigned 8-digit number identifying the subject.
- Image (string): one of 54 images. The 18 objects are: whitecoup, bear, bird, bug, cow, elk, hatchback, horse, monkey, motorcycle, pickuptruck, sciencevan, sheep, sportscar, squirrel, SUV, tiger, and tractor. Each image name has one of three endings that indicates the context: incong, mancong, or nobkgd.
- ContextID (enum): one of 0 (incong background), 1 (mancong background), or 2 (nobkgd).
- Slide (enum): the coded location of the fixation, as specified above.
- Number (numeric): the index of the fixation in the sequence of fixations for that image by that participant.
- x (numeric): the x location (in pixels) of the location of the fixation.
- y (numeric): the y location (in pixels) of the location of the fixation.
- StartTime (numeric): the starting time (in s) of the fixation.
- EndTime (numeric): the ending time (in s) of the fixation.
- Duration (numeric): the duration of the fixation (in s).
Australia Coast Climate Data: this data set is from an
international project (LOICZ) that researches anthroprogenic effects
and their impact on the environment and climate. While the data is
somewhat old, from 2000, it provides an excellent testbed for various
visualization and analysis techniques. Each data point represents 17
different variables measured on a 0.5 degree grid covering the
coastlines of Australia and New Zealand.
Each data point has six identifying variables: id, longitude, latitude, continent, basin ID, and cell location (sea, coast, land). The longitude and latitude are numeric variables and can be used for plotting information in a map-like format.
Each data point also has 17 modeled or actual measurements that represent averages or min/max values over some time span for the spatial cell. The values are the monthly precipitation minimum and maximum values (premin, premax (in)), the minimum and maximum measured monthly salinity (salmin/salmax), the minimum and maximum monthly air temperature (minairtemp, maxairtemp (K)), the minimum and maximum monthly sea-surface temperature (minsst, maxsst (mK)), the minimum and maximum monthly soil moisture (minsoilmoist, maxsoilmoist), the standard deviation of elevations within the cell (stdev_elevdepth (m)), the minimum and maximum monthly CZCS readings (a measure of vegetative growth in water, Min_czcs, Max_czcs), estimated yearly water runoff (runoffnew), average wave height (Wave_heigh (m)), and a proxy variable incorporating coastal winds and tidal range (Tideformproxy).
The main goal this week is to explore how to visualize these data sets. A secondary goal is to enable simple analysis, such as providing means, standard deviations, and ranges in easy-to-use forms.
Read through all of the tasks and plan your design before you start writing code.
For both data sets, it will be necessary to use one column to filter
which data to display. In the case of bird arrivals, users may wish to
filter data by year, species, observation region, or some
combination. Users may also wish to view which species have arrival
observations within a certain window of dates.
In the case of the eye-tracking data, a user may wish to view durations filtered by subject, image, coded location, or some combination of those variables.
Implement the capability for the user to specify which columns to view, which columns to use as a filter, and what the parameters of the filter should be. You may want to begin by working with your DataSet class to create a filteredSelect method and then build the interface to provide the required inputs to the function. Note that the column being viewed may also be the column being used as a filter. For example, a user may want to view the bird arrival DOY distribution for values of DOY between 100 and 150.
You can assume the user is reasonably friendly and intelligent for this task. To let the user specify the range of values in a column to use as a filter you can use simple text boxes. The filtering capability should, ideally, apply to 1D, 2D and 3D plots.
For both data sets, the user probably wants to see multiple 1-D plots
overlaid on one another or in the same visual figure (like a
multi-plot). For example, the user may want to see multiple bird
species plotted as histograms of DOY, or for the eye-tracking data,
the user may want to see a histogram of durations for one set of
images compared to a second set of images.
Implement the capability for the user to view multiple 1-D plots within the same figure (the kinds of plots you are making for task 1). This will probably be a check box to indicate whether to expand or overlay a new plot on the current figure or whether to create a new figure. Restrict this task to plots you are building with matplotlib, such as histograms.
For the last project you implemented 2D or 3D viewing with color to
provide a 4th dimension to the visualization.
This week, implement using size or shape to the visualization to enable interactive viewing a 4th (2 spatial + color + size) or a 5th (3 spatial + color + size) dimension. Demonstrate this on the AustraliaCoast data set.
- Pick one of the following visualizations and add them to your system. Talk with Prof. Maxwell about the visualization before you start.
- For the bird arrivals data, calculate the mean and standard deviation of the DOY for each bird across all years and regions. Then create a new data set where each data point is the name of the bird, its mean DOY, and the standard deviation of its DOY. Then create a 2D plot of all the bird species using the mean and standard deviation values as the axes. See if you can attach text with the bird's name to the data point or have it pop up when the user clicks or moves their mouse over a data point (neither is required, though). Expand the derived data set, and the range of possible visualizations, by calculating DOY mean and standard deviation values by region or by year.
- For the bird arrivals data, enable plotting the mean arrival time in one region versus the mean arrival time in a second region across all birds. For example, calculate the mean arrival time in region one for all birds, then calculate the mean arrival time in region two for all birds. Plot the results in a 2D graph. Calculate the mean arrival times for a 3rd region and make it a 3D graph.
- For the Eye-Tracking data, create a visualization that plots the fixations on top of the stimulus image (for one subject or multiple subjects). The images are available in the Data directory on the Academics server.
- For the Eye-Tracking data, create a heat map that overlays the density of fixations over the stimulus image.
- Create a visualization that, over all subjects, shows the location of the first fixation on the foreground object (coded category F). Use color to indicate the position of the fixation in the ordering (e.g. bright green if it's number 1, bright red if it's the last fixation).
- Create a visualization that, over all subjects, explores where subjects looked first. See if you can create a plot that captures when they made the first fixation (start time), how long it lasted (duration), and what they were viewing (coded category).
- Be creative and develop something you think would be useful for exploring the Bird Arrivals or Eye-Tracking data or demonstrating some aspect of the relationships within it.
- Do more than one option from the last task.
- Enable the user to click on a data point in the visualization and have it generate a dialog box showing the complete feature vector for that data point. This feature is guaranteed to be used if you provide it. It's almost guaranteed to be requested if you don't.
- In the 1-D visualization, generate an additional dialog window that includes basic analysis of the data such as the range, mean, median, mode, or standard deviation. See if you can indicate the mean, median, and mode visually in the plot.
- In addition to color, enable your visualization to use size or other feature to represent an axis. For example, use icons for representing an enumerated variable.
For this week's writeup, create a wiki page that shows your visualizations. Explain how you made them, what they mean, and what they show. Your audience is the people who are working on the research.
Once you have written up your assignment, give the page the label:
Put your code in the COMP/CS251 folder on fileserver1/Academics. Please make sure you are organizing your code by project.