Title image Fall 2018

More Useful Functions

The purpose of this project is to look at some more useful functions and to practice some of the things you've learned previously. For each task, please make a separate function. For example, for task 1, create a function with the form below.

task1 <- function( ) {
  # code for task 1 goes here

  # have the function return the data frame created/read in this function
}

Project Tasks

    Open R Studio or a text editor and Create a new R file called project4.R. Put your name, Project 4, and the date at the top of the file. Save your file to an appropriate working directory.

  1. Eventually, we all have to deal with reading in data that has missing values or that R interprets differently from how we want to use it. There are many functions we can use to explore our data, making sure it matches what we want. In cases where it does not, we can either manually fix the problems or try adding arguments to the read function that avoid the problems.

    Create a function called task1 to encapsulate this task, as specified above. Inside the function, execute the following sub-tasks.

    1. Download the file mixed.csv. Store it in the same directory as your project4.R file. Set the working directory to your source file location.
    2. Assign to a variable the data frame generated by calling the read.csv() function to read the mixed.csv file.
    3. Print the result of calling the str() function with the data frame as the argument to print out a representation of the data frame.
    4. Print the result of calling the summary() function on the data frame to print out a summary of the data in the data frame.
    5. Use a for loop to loop over the columns of the data frame. You can do this using either syntax below.
      #option 1
      for( column in myDataFrame ) {
         # code to repeat goes here, column refers to each column in turn
      }
      
      #option 2
      for(i in 1:length(myDataFrame) ) {
         # code to repeat goes here, myDataFrame[[i]] refers to each column in turn
      }

      Each time through the loop, print the result of applying the class() and the typeof() functions to the current column. Note that three of the columns are factors, including the first one. See if you can write out the name of each column along with its class and type (it's easier to use the second option for loop to do this).

    6. Re-assign the "Apple" column to be numeric. Do do this, you have to first convert the column to a character column using as.character(), then convert the column to a double column using as.double(). You can compose the two functions and assign the result back to the "Apple" column in a single line.
    7. Print the result of calling the str() function on the data frame to note the change.
    8. Read in the data again using read.csv, re-assigning the data frame variable. This time use the na.strings parameter and give it the value c("NA", ".").
    9. Print the result of calling the str() on the data frame to note the difference.
    10. Convert the "Banana" column to a character vector using as.character(). Then print out the result of calling summary on the data frame.
    11. On the last line, return the data frame

    At the end of your R file (put all top level code here), call the task1 function, assigning the return value to a variable.

  2. Create a function called task2 to encapsulate this task. The function should have one parameter, which is the data frame to use. This function does not need to return anything explicitly. Inside the function, write code to do the following tasks.

    1. The which() function takes in a logical vector and returns the set of indices corresonding to the TRUE elements of the vector. An expression like myDataFrame$Apple < 5 generates a logical vector. Use the which function to assign to the variable indexes the set of indices of the values of the Pear column that are less than 50. Print out the indexes variable and double-check that it is correct by printing out the Pear column and using indexes to select that subset of the data (e.g. myDataFrame[indexes]. All of the values should be less than 50.
    2. The seq() function takes in one, two, or three arguments and allows you to generate sequences of numbers. The three argument version takes in the starting value, the ending value, and the step value. Use the seq() function to assign to the variable indexes2 the values from 1 to 10 in steps of 3. Use indexes2 to print out that subset of the Pear column. You should get the values at position 1, 4, 7, and 10.
    3. Print out the odd index terms in the Apple column using seq to generate the indexes.
    4. Print out the values between 4.5 and 5.5 in the Apple column using whatever method you like.

    At the end of your R file, call the task2 function with the data frame from task1 as the argument.

  3. There are many other capabilities built into the R base plotting system. One of these is the ability to put lines, text, and arrows into a plot.

    Create a function called task3 to ecapsulate this task. The function should have one parameter, which is the data frame to use. Inside the function, write code to do the following tasks. This function does not need to return anything explicitly.

    1. Make a bar plot using the Size column. Adjust the amount of space between the columns and specify the x-axis label and main title.
    2. Use the arrows() function to add an arrow to the plot pointing at the small bar.
    3. Use the text() function to add some text at the tail of the arrow saying "There are lots of small fruits".
    4. Use the lines() function to add a blue line of width 2 underneath the text.

    At the end of your R file, call the task3 function with the data frame from task1 as the argument.

  4. Generating random numbers from uniform or normal distributions can be extremely useful for a wide variety of applications. For this task, you will create random numbers and view their distributions using a histogram.

    Create a function called task4 to encapsulate this task. The function should have one optional parameter, N, which is the number of points to use. Give the N parameter a default value of 1000. Inside the function, write code to do the following tasks. This function does not need to return anything explicitly.

    1. Assign to the variable flat the result of using the runif() function to generate a vector of N uniformly distributed numbers in the range 0 to 100.
    2. Assign to the variable bump the result of using the rnorm() function to genreate a vector of N normally distributed numbers with a mean of 50 and a standard deviation of 10.
    3. Generate a histogram plot of the uniformly distributed numbers. Use the argument xaxt='n' to tell the hist() function not to generate an X-axis. Then use the axis() function to create a new X-axis.
    4. Generate a histogram plot of the normally distributed numbers. As with the prior step, create your own X-axis with customized features like a smaller font size (e.g. cex.axis parameter).
    5. Use the arrows function to put a blue arrow of thickness 3 pointing at the central bump. Place the arrow to the right of the bump pointing left.
    6. Use the text function to put the text "Big Bump!" in a dark red color to the right of the tail of the arrow.
    7. Use the lines function to put a green line of thickness 4 under the text.

    Once you have completed the task4 function, at the end of your R file call task4 3 times: once with the argument 100, once with no argument, and once with the argument 10000. Save one of the plots of the normal distribution and hand it in with your code.


Report

Answer the following questions. Submit your answers as a plain text file or PDF in your handin directory. Put your name at the top of the file. No credit will be given for any other format or for a file without a name.


Handin

Create a project4 folder inside the Private folder in your Courses directory. Put your project4.R file and your text/PDF file with your answers into the project4 folder. They should be the only documents in the project4 folder. You do not need to put a copy of the GSS data in your handin folder.