How to Write a Function

February 26, 2014 | Alan

One of the most powerful features of R is the ability for you to write your own functions. Writing functions in R allows us to produce code that is easy to read, avoids repetition, and is easy to reuse.

Learning Outcomes

After reading this blog you will know how to transform code into reusable functions.

Workflow

Step 1 – Functions without Arguments

A function in R is specified using the following format:

function_name <- function(argument1, argument2, ...) {
    # Some R code
    return(variable_to_return)
}

To run the function:

function_name(argument1, argument2, ...)

We start off with a simple function that calculates the maximum value of two vectors:

max_function <- function() {
    a <- c(1, 2, 3, 4, 5)
    b <- c(6, 7, 8, 9, 10)
    max1 <- max(a)
    max1
    max2 <- max(b)
    max2
}

max_function()
## [1] 10

This returns only the last variable in the function. To control what is returned from a function we can use the return() command. Let’s say we want to return the value of max1 instead. We could rewrite the function as follows:

max_function <- function() {
    a <- c(1, 2, 3, 4, 5)
    b <- c(6, 7, 8, 9, 10)
    max1 <- max(a)
    max2 <- max(b)
    return(max1)
}

max_function()
## [1] 5

A function has only one return value. If we want to return more than one value we have to create a list, dataframe, matrix or vector to return the results. So in this example, if we wanted to return both values we could create a data frame for these to sit in:

max_function <- function() {
    a <- c(1, 2, 3, 4, 5)
    b <- c(6, 7, 8, 9, 10)
    max1 <- max(a)
    max2 <- max(b)
    bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
    return(bothMaximums)
}

max_function()
##   maximum1 maximum2
## 1        5       10

 

Step 2 – Functions with Arguments

Let’s say now we didn’t know in advance which vectors we would want to run the function against. This is where arguments to functions come in, because we can use them to pass information to a function. Arguments can be in the form of, for example, vectors, lists, data frames, and matrices. So if we wanted the option to pass any two vectors to the function we would rewrite it as follows:

max_function <- function(vector1, vector2) {
    max1 <- max(vector1)
    max2 <- max(vector2)
    bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
    return(bothMaximums)
}

Let’s create two vectors to pass to our function:

a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)

max_function(vector1 = a, vector2 = b)
##   maximum1 maximum2
## 1        5       10

This gives us the same result as the function above. However it is now a lot easier to change which vectors we want to run the function against.

Create two more vectors to verify these results and run the function:

c <- c(11, 12, 13, 14, 15)
d <- c(16, 17, 18, 19, 20)

max_function(vector1 = c, vector2 = d)
##   maximum1 maximum2
## 1       15       20

We can also set default arguments to a function. This means that we can run the function without having to pass any arguments to it for these specific parameters:

max_function <- function(vector1 = a, vector2 = b) {
    max1 <- max(vector1)
    max2 <- max(vector2)
    bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
    return(bothMaximums)
}

max_function()
##   maximum1 maximum2
## 1        5       10

However if we want to run the function for different vectors this can be done just as we did above:

max_function(vector1 = c, vector2 = d)
##   maximum1 maximum2
## 1       15       20

Variables are assigned to arguments in a function in either of two ways:

  • By name
  • By position.

The following function calls all return the same result:

max_function(vector1 = c, vector2 = d)
max_function(c, d)
max_function(vector2 = d, vector1 = c)
max_function(vector2 = d, c)
##   maximum1 maximum2
## 1       15       20

If we pass too many arguments to a function, even if this includes all necessary parameters, the function returns the following error:

max_function(c, d, e)
## Error: unused argument (e)

Step 3 – Functions Involving Loops and Control Statements

In R, functions such as sapply and apply are preferred to looping. More information about this family of functions can be found here. Loops are generally slower to run and require you to do more typing than apply, but they are perhaps more intuitive when we are starting off.

Let’s start with an example where we use a for loop to take the square of each value in a vector that we pass to the function:

squares_function_1 <- function(vector) {
    square <- numeric()
    for (i in 1:5) {
        square[i] <- (vector[i])^2
    }
    return(square)
}

squares_function_1(a)
## [1]  1  4  9 16 25

Note that any variables must be pre-defined, before we add values to them in a loop – that is, we cannot create variables inside a loop. It is better practise to create a variable with an initial length if we know how many iterations we will be doing. In these examples, this would not make much difference, but when we are working with large datasets it will noticeably faster if we specify square<-numeric(100000), rather than square<-numeric(). Also if we run the loop from 1 to the length of the input vector, rather than 1 to 5, this allows us flexibility with what we pass to the function.

Now let’s say we only want to return these values only if they are larger than 15. We can do this using an ifelse statement.
In this function call:

  • The first argument is the condition we are testing
  • The second argument is the return value if the condition is true
  • The third is the return value if the condition is false:
squares_function_2 <- function(vector) {
    square <- numeric()
    results <- numeric()
    for (i in 1:length(vector)) {
        square[i] <- (vector[i])^2
        results[i] <- ifelse(square[i] > 15, square[i], "-")
    }
    return(results)
}

squares_function_2(a)
## [1] "-"  "-"  "-"  "16" "25"

Step 4 – Functions That Call Other Functions

We can run user-defined functions from other functions in R, just like we can with the predefined R functions. For example, let’s write a function that takes the sum of the squared values in a vector. Because we already have a function that squares the values in a vector, we can embed this inside our new function:

sum_squares <- function(vector) {
    squares <- squares_function_1(vector)
    SS <- sum(squares)
    statement <- paste("The sum of the squared values in the vector is ", SS, 
        sep = "")
    return(statement)
}

sum_squares(vector = c)
## [1] "The sum of the squared values in the vector is 855"

If we are running this in isolation we must remember to run the code for the function squares_function_1 beforehand in the R session, otherwise it will not be recognised when we run the second function.

What next?

Now that we have accomplished writing functions we can look at some applications of some more complicated functions in our blogs:

Using run charts to visualise variant CPs due to changing targets of A & E waiting time data

Further Reading

http://nicercode.github.io/guides/functions/

http://pj.freefaculty.org/guides/Rcourse/functions-1/functions-1.pdf

http://faculty.nps.edu/sebuttre/home/R/functions.html

http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/


 

Leave a Reply

Your email address will not be published. Required fields are marked *