The goal of this lab is to start the process of adding a new data analysis capability to your GUI. In particular, you will be implementing simple linear regression of two variables and plotting the results.
In the lab, your goal should be to execute a linear regression on two variables and then plot the result on the screen.
- Give yourself a new working directory and copy your display, viewing, analysis and data python files into it. It's best to start with copies and modify them from there.
- In your display class, add variables for holding (a) the objects associated with a linear regression (a tk Line object), (b) the endpoints of the line in normalized data space. The first should be an empty list, the initial endpoints can be initialized to None.
- In your display class, add a menu item to the command menu that calls the handleLinearRegression function, which you will create in the next step.
In your display class, create a method
(e.g. handleLinearRegression that will let the user select
the variables to fit and then display them on the main screen. It
should have the following steps, each of which you may want to do in a
- Create a dialog class that lets the user select an independent (x) variable and a dependent (y) variable. If you want to let the user also pick variables for color and size, that is up to you. The dialog window needs to at least return two headers from your numeric data.
- Clear the existing points from the window.
- Clear any existing data fits or models from the window. This will delete any objects in your linear regression objects list.
- Reset the view to the default position.
- Update the axes.
- Call a buildLinearRegression function that creates the canvas line object to show the linear regression fit graphically.
Start with creating the dialog window and make sure it returns two headers. The write the buildLinearRegression function (see next step for details), then go back and deal with clearing the points of any existing plots, clearing any prior linear regression fit, and resetting the view.
Create the buildLinearRegression function. This function should do the
- Extract the two columns selected by the user from the data set. Make the independent variable the X column and the dependent variable the Y column. Normalize the columns separately.
- Add a third column of zeros to the matrix.
- Add a fourth column of ones to the matrix. You need to store this matrix in your self.datapts field, or whatever field you used to store the data in your buildPoints function from last week.
- Build the vtm, multiply it by the data points, and then create the ovals to plot the data on the screen. This should make a 2-D plot of the two variables, with the independent variable along the x-axis. At this point, you should be able to test your function to see if it makes the 2D data plot. If you did it right, the translations, rotations, and scales should all still work as expected.
- Use the scipy.stats.linregress (import scipy.stats) function to calculate the linear regression of the independent and dependent variables. Note, you will need to get the unnormalized data from your Data object in order for this to work properly. The linear regression must occur in data space. Store all of the outputs of the linregress function. linregress Documentation
- Get the range of the independent (x) and dependent (y) variables (use your analysis.data_range function).
Make the endpoints of the linear regression line fit. Note that
these endpoints need to end up in normalized data space, while the
linear regression model is in unnormalized data space. In
normalized space, the x values of the endpoints will be 0.0 and
1.0. If the slope is m and the y-intercept is b, then the y
values in normalized data space will be:
((xmin * m + b) - ymin)/(ymax - ymin)
((xmax * m + b) - ymin)/(ymax - ymin)
- Multiply the points by the vtm and then make a line out of the two endpoints. Make it a color that will stand out.
- Your program should somehow communicate the linear regression coefficients to the user. You could do this by making a tk.Label object and putting text into it giving the slope, intercept, and R-value for the fit.
In addition to testing along the way, test out your system now
using this data file. It contains two
variables, linearly related with some noise. Your fit should give a
slope of 1.995, and intercept of 1.012, and an R^2 value of 0.792.
An example plot is shown below.
When you are finished with the lab, go ahead and continue with the project.