Title image Fall 2018

R Markdown

The purpose of this project is to introduce you to the R Markdown system. R Markdown is a method of creating web documents the let you display code, plots, and text in an integrated manner. It is an excellent way to present your work, because it enables someone looking at your results to understand exactly how they were generated.


Project Tasks

    Open R Studio or a text editor and Create a new R Markdown file called project5.Rmd. R Studio will give you a dialog where you can enter a title, your name, and select an output format. Select HTML for the output format. Save your file to an appropriate working directory.

  1. The default R Markdown file includes the two built-in data sets cars, and pressure. If you are not using these data sets, you can remove those variables from the {r} statements. Your code blocks should go in between the expressions ```{r} and ``` as below.
    ```{r}
    # code goes here
    a <- 8
    b <- 5
    c <- a + b
    ```
  2. In the first code block, assign to a variable generateData a function with one parameter, N, which is the number of data points to generate. Inside the function, do the following.
    1. Use the set.seed function to set the random seed to a value of your choice. Can you make the seed a parameter of the function with a default value?
    2. Assign to tmpx a vector of N numbers uniformly distributed betwen 0 and 100. You can use the runif function to do this.
    3. Assign to tmpy a linear function of tmpx with added noise. One way to implement this is to consider the equation:

      tmpy = (tmpx + noise)*slope + intercept + noise

      You can generate a vector of uniform or normally distributed noise using runif or rnorm. Note that you have to generate the same number of noise values as there are values in tmpx.

    4. Use the data.frame function to create a data frame with two columns x and y that get the values in tmpx and tmpy, respectively. Have your function return the data frame.
  3. After you have defined the function, assign to a variable df the result of executing the function with the parameter 100.
  4. Assign to a variable (e.g. model) the result of calling the linear model function lm with an argument that relates df$y to df$x.
  5. Create a plot of df$x and df$y, then add the best fit line using the abline function. Using the capabilities you have learned in prior projects, make the plot use solid points (pch argument), and make the points and the line fit be different colors. Put the linear fit equation into the plot using the text function.
  6. In your markdown file, put some text around your code and your plot explaining what the code does and describing the plot.
  7. In your markdown file, and using a data set of your choice, read the data and execute a linear regression analysis of two of its columns. One possibility is AGE and WRKYEARS in the GSS 1991 data. Plot the two columns and the linear fit, making the graph look nice. Add some text around your code to describe the data set and discuss the results (i.e. do they make sense?).

    If you are using a package like foreign to read an SPSS file, you will have to include a library command in your R Markdown file. Likewise, you will also have to include the set working directory function, setwd, to specify where R should look for the file. You can figure out what these commands are by first unclicking and clicking the foreign package and copying the last command that appears. It should look something like the following (MacOS).

    library("foreign", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")

    Likewise, you can use the menu to set your working directory to your source file and copy the setwd command that appears.

  8. View your markdown output in a browser, then use the browser's print capability to generate a pdf. Submit the pdf with your code. You may answer the questions below in your markdown file.

Report

Answer the following questions. Submit your answers as a plain text file or PDF in your handin directory. Put your name at the top of the file. No credit will be given for any other format or for a file without a name.


Handin

Create a project5 folder inside the Private folder in your Courses directory. Put your R Markdown file, the pdf output of your markdown file, and your text/PDF file with your answers (if they are not in th emarkdown file) into the project5 folder. Include your the data set you used for the last task (but not if you used the GSS data).