# How to Write a Function

### February 26, 2014 | Alan

One of the most powerful features of R is the ability for you to write your own functions. Writing functions in R allows us to produce code that is easy to read, avoids repetition, and is easy to reuse.

### Learning Outcomes

After reading this blog you will know how to transform code into reusable functions.

## Workflow

### Step 1 – Functions without Arguments

A function in R is specified using the following format:

``````function_name <- function(argument1, argument2, ...) {
# Some R code
return(variable_to_return)
}
``````

To run the function:

``````function_name(argument1, argument2, ...)
``````

We start off with a simple function that calculates the maximum value of two vectors:

``````max_function <- function() {
a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)
max1 <- max(a)
max1
max2 <- max(b)
max2
}

max_function()
``````
``````##  10
``````

This returns only the last variable in the function. To control what is returned from a function we can use the `return()` command. Let’s say we want to return the value of `max1` instead. We could rewrite the function as follows:

``````max_function <- function() {
a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)
max1 <- max(a)
max2 <- max(b)
return(max1)
}

max_function()
``````
``````##  5
``````

A function has only one return value. If we want to return more than one value we have to create a list, dataframe, matrix or vector to return the results. So in this example, if we wanted to return both values we could create a data frame for these to sit in:

``````max_function <- function() {
a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)
max1 <- max(a)
max2 <- max(b)
bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
return(bothMaximums)
}

max_function()
``````
``````##   maximum1 maximum2
## 1        5       10
``````

### Step 2 – Functions with Arguments

Let’s say now we didn’t know in advance which vectors we would want to run the function against. This is where arguments to functions come in, because we can use them to pass information to a function. Arguments can be in the form of, for example, vectors, lists, data frames, and matrices. So if we wanted the option to pass any two vectors to the function we would rewrite it as follows:

``````max_function <- function(vector1, vector2) {
max1 <- max(vector1)
max2 <- max(vector2)
bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
return(bothMaximums)
}
``````

Let’s create two vectors to pass to our function:

``````a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)

max_function(vector1 = a, vector2 = b)
``````
``````##   maximum1 maximum2
## 1        5       10
``````

This gives us the same result as the function above. However it is now a lot easier to change which vectors we want to run the function against.

Create two more vectors to verify these results and run the function:

``````c <- c(11, 12, 13, 14, 15)
d <- c(16, 17, 18, 19, 20)

max_function(vector1 = c, vector2 = d)
``````
``````##   maximum1 maximum2
## 1       15       20
``````

We can also set default arguments to a function. This means that we can run the function without having to pass any arguments to it for these specific parameters:

``````max_function <- function(vector1 = a, vector2 = b) {
max1 <- max(vector1)
max2 <- max(vector2)
bothMaximums <- data.frame(maximum1 = max1, maximum2 = max2)
return(bothMaximums)
}

max_function()
``````
``````##   maximum1 maximum2
## 1        5       10
``````

However if we want to run the function for different vectors this can be done just as we did above:

``````max_function(vector1 = c, vector2 = d)
``````
``````##   maximum1 maximum2
## 1       15       20
``````

Variables are assigned to arguments in a function in either of two ways:

• By name
• By position.

The following function calls all return the same result:

``````max_function(vector1 = c, vector2 = d)
max_function(c, d)
max_function(vector2 = d, vector1 = c)
max_function(vector2 = d, c)
``````
``````##   maximum1 maximum2
## 1       15       20
``````

If we pass too many arguments to a function, even if this includes all necessary parameters, the function returns the following error:

``````max_function(c, d, e)
``````
``````## Error: unused argument (e)
``````

### Step 3 – Functions Involving Loops and Control Statements

In R, functions such as `sapply` and `apply` are preferred to looping. More information about this family of functions can be found here. Loops are generally slower to run and require you to do more typing than `apply`, but they are perhaps more intuitive when we are starting off.

Let’s start with an example where we use a for loop to take the square of each value in a vector that we pass to the function:

``````squares_function_1 <- function(vector) {
square <- numeric()
for (i in 1:5) {
square[i] <- (vector[i])^2
}
return(square)
}

squares_function_1(a)
``````
``````##   1  4  9 16 25
``````

Note that any variables must be pre-defined, before we add values to them in a loop – that is, we cannot create variables inside a loop. It is better practise to create a variable with an initial length if we know how many iterations we will be doing. In these examples, this would not make much difference, but when we are working with large datasets it will noticeably faster if we specify `square<-numeric(100000)`, rather than `square<-numeric()`. Also if we run the loop from 1 to the length of the input vector, rather than 1 to 5, this allows us flexibility with what we pass to the function.

Now let’s say we only want to return these values only if they are larger than 15. We can do this using an `ifelse` statement.
In this function call:

• The first argument is the condition we are testing
• The second argument is the return value if the condition is true
• The third is the return value if the condition is false:
``````squares_function_2 <- function(vector) {
square <- numeric()
results <- numeric()
for (i in 1:length(vector)) {
square[i] <- (vector[i])^2
results[i] <- ifelse(square[i] > 15, square[i], "-")
}
return(results)
}

squares_function_2(a)
``````
``````##  "-"  "-"  "-"  "16" "25"
``````

### Step 4 – Functions That Call Other Functions

We can run user-defined functions from other functions in R, just like we can with the predefined R functions. For example, let’s write a function that takes the sum of the squared values in a vector. Because we already have a function that squares the values in a vector, we can embed this inside our new function:

``````sum_squares <- function(vector) {
squares <- squares_function_1(vector)
SS <- sum(squares)
statement <- paste("The sum of the squared values in the vector is ", SS,
sep = "")
return(statement)
}

sum_squares(vector = c)
``````
``````##  "The sum of the squared values in the vector is 855"
``````

If we are running this in isolation we must remember to run the code for the function `squares_function_1` beforehand in the R session, otherwise it will not be recognised when we run the second function.

### What next?

Now that we have accomplished writing functions we can look at some applications of some more complicated functions in our blogs:

Using run charts to visualise variant CPs due to changing targets of A & E waiting time data

http://nicercode.github.io/guides/functions/

http://pj.freefaculty.org/guides/Rcourse/functions-1/functions-1.pdf

http://faculty.nps.edu/sebuttre/home/R/functions.html

http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/