typeof("Hello, World!")
[1] "character"
typeof(4.2)
[1] "double"
typeof(42L)
[1] "integer"
typeof(TRUE)
[1] "logical"
typeof(2 + 3i)
[1] "complex"
typeof(raw(1))
[1] "raw"
While everything in R is an object, the vast majority of the objects are vectors. Vectors are often called the building blocks of R objects. This chapter will focus on the homogeneous version of vectors, atomic vectors. Next chapter we will introduce heterogeneous vectors, that is generic vectors, often called lists.1 Because atomic vectors are homogeneous, we will need to introduce and discuss object types.
After reading these notes you should be able to:
Every object in R has a type. To determine an object’s type, you can use the typeof()
function.
[1] "character"
[1] "double"
[1] "integer"
[1] "logical"
[1] "complex"
[1] "raw"
The above demonstrates the six types of atomic vectors.
We will largely ignore complex and raw, as you will rarely encounter them.2 We will focus on character, double, integer, and logical vectors.
We’ve already seen how to create larger (longer) vectors through the use of the c()
function. For example:
This is a double vector. Additionally, each element of this (atomic)3 vector, 4.2
, 6.1
, and 1.3
, are also double vectors. This is because R does not have a notion of a scalar. Instead, they are length one vectors.
Character vectors are used to store text strings.4
To create character vectors, you can use either single ('
) or double ("
) quotation marks, so long as the opening and closing quotation marks match.
The quotation marks are needed, because without them, R will assume you are trying to reference an object by name.
Error: object 'foo' not found
As excepted, we can use the c()
function to store larger (longer) character vectors. Note that these notes, and RStudio, will often provide some syntax highlighting that helps understand the following code. In this case, the strings, including their quotation marks, are a different color than the rest of the code used to create the object.
[1] "This" "is" "a" "long" "character" "vector."
Like any object, we can assign them names.
We can then use those names (or the code to create the objects) to verify the types of the objects that the names are assigned to. Remember, we are not technically checking the type of foo
5, we are checking the type of the object that currently has the name foo
assigned to it. We’ll eventually relax and not be so pedantic, but it’s an important distinction.
Here we’re using the c()
function to group the output of checking the type of each of the vectors, which itself returns a character vector! Very meta! How do we know this is a character vector? Well, we can of course simply check.
However, eventually you’ll become familiar with some of the context clues that R leaves behind when it prints objects. In this case, the quotation marks, "
, are the clue.6
What if you need to include a quotation mark in a string? You have two options:
\
.Because R uses the "
symbol to print strings, if you inserted a "
into the string, it will need to display the escape character when printing. To see the string rendered without the escape character (and without the quotation marks used as syntax to define the character vector), use cat()
.
For additional details and documentation on character vectors, use:
The is.character()
function checks if a vector is character typed and returns the logical value TRUE
or FALSE
accordingly.
Most numbers you encounter in R will be in the form of double vectors. A double vector stores floating point values.7 If you’re interested in some of the details of how R performs floating point arithmetic on your machine, use:
For our purposes, you can mostly just think of double vectors as numbers and ignore these details.
[1] "double"
[1] "double"
[1] "double"
[1] "double"
Notice that for each of the above, the type is double. You might think that 6
is integer typed, but again, when you type a number, it is almost always a double. More on integers shortly.
Numbers are doubles8. Doubles are numbers.
Three special double values that you may encounter are Inf
, -Inf
, and NaN
for infinity, negative infinity, and not-a-number respectively.
For details on each, use:
It is often useful to create sequences of numbers. To do so, the seq()
function is extremely useful. The seq()
function generally uses three arguments:
from
, the starting value of the sequence.to
the upper limit of the sequence. Often this is the last value of the sequence.by
, how to increment between values of the sequence.The function returns a (usually double) vector containing the elements of the sequence defined.
Alternatively, you can use the length.out
argument instead of by
to specify the length of the output, and the increment will be calculated automatically.
[1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
[16] 0.75 0.80 0.85 0.90 0.95 1.00
For additional details of the seq()
function, use:
For additional details and documentation on double vectors, use:
The is.double()
function checks if a vector is double typed and returns the logical value TRUE
or FALSE
accordingly.
Sometimes, a number in R is an integer9 vector. There are two ways you are likely to encounter them:
L
10 to the end of a number. For example: 42L
.:
operator to create integer sequences.While humans would recognize 42 as an integer, in R, simply typing 42
will produce a double.
To indicate that you would like 42 stored as an integer, use 42L
.
It’s rare that you truly need to do this. However, by chance, you will often create integer vectors when using the :
operator. The :
operator can be used to quickly create sequences. While it does not necessarily create integer sequences, if the resulting vector can be properly represented by integers, it will return integers.
However, it can also be used to create obviously non-integer sequences.
[1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
[16] 16.1 17.1 18.1 19.1
For additional details, use
While R does make the distinction between integer and double, often, R users who care about higher level abstractions involved in data analysis do not. As such, in addition to a type, all objects have a mode. We won’t dig into the details of mode, but you should be aware that integer and double vectors share the same mode: numeric.
Like, the typeof()
function, the mode()
function determines the mode of an object.
For additional details and documentation on integer vectors, use:
For additional details and documentation on numeric vectors, use:
For additional details and documentation about mode, use:
The is.integer()
function checks if a vector is integer typed and returns the logical value TRUE
or FALSE
accordingly.
The is.numeric()
function checks if a vector has mode integer and returns the logical value TRUE
or FALSE
accordingly. It will return TRUE
for both integer and double typed vectors.
As the name suggests, logical vectors store logical values. There are two logical values, TRUE
and FALSE
.
Logical vectors will be important later for subsetting and other programming tasks.
Technically, the NA
value is also a logical vector.
However, this requires additional explanation that we will defer until after we have introduced type coercion.
Note that T
and F
can be used as shortcuts to TRUE
and FALSE
. That is, they are names that are by default assigned to TRUE
and FALSE
.
However, we recommend you not use them. While TRUE
and FALSE
are reserved words, T
and F
are not. Reserved words are words (names) that reference a particular object, and cannot be used to refer to other objects. That is, you cannot reassign these names. To see a list of reserved words, use:
To demonstrate, attempting to use TRUE
during assignment results in an error.
Error in TRUE = 42 : invalid (do_set) left-hand side to assignment
You’re free to do evil things with T
and F
. For example:
You’re been warned.
For additional details and documentation on logical vectors, use:
The is.logical()
function checks if a vector is logical typed and returns the logical value TRUE
or FALSE
accordingly.
Consider the following vectors:
The length of a vector is the number of elements of the vector. You can determine the length of a vector with the length()
function.
The vectors assigned names evens
and primes
have lengths 50 and 10 respectively.
Let’s print the evens
vector.
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
[20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
[39] 78 80 82 84 86 88 90 92 94 96 98 100
You should notice two things. First, it displays each element of the vector, as you would expect. However, there is also some additional information printed along the left hand side. In the case of these notes, we see [1]
, [20]
, and [39]
.11 What is this information?
These numbers, [1]
, [20]
, and [39]
, are specific indexes of the vector. That is, each element of a vector has an index which is its position in the vector. R is a 1-indexed language. That is, the first element of the vector has index 1.12
This is why you see [1]
in the console so often. Since the vast majority of vectors have at least one element, and R prints the index of first element displayed on each line when printing a vector, you almost always see [1]
. Note that, while it appears that a vector is being printed in rows (lines), a vector is a 1-dimensional, flat object. This is simply a printing side-effect to make the results more human readable.
While we will discuss subsetting vectors and other objects in great detail later, for now we’ll note that the [
operator can be used to access specific elements of a vector by supplying its index.
[1] 40
[1] 78
[1] 7
[1] 29
Vectors of a specific type and arbitrary length can be created with the vector()
function.
[1] FALSE FALSE FALSE
[1] 0 0 0 0
[1] 0 0 0 0 0
[1] "" "" "" "" "" ""
Notice, that each of the above contains elements that are similar to 0
for its specific type, so FALSE
for logical and an empty string for character.
All vector types can have length zero. When using the vector function, if you do not specify a length, or directly specify 0
, it creates a length zero vector of that type.
logical(0)
integer(0)
numeric(0)
character(0)
You will eventually encounter vectors that look like this13 when something goes wrong with your code. As such, it is helpful to understand what they represent, and especially what their type is.
There are also four shortcut functions related to these with more specific functionality: logical()
, integer()
, double()
, and character()
.
Somewhat related to the concept of a zero length vector is the NULL
value. The NULL
value represents nothing, like the empty set in mathematics. It has type NULL
and no length.
When used together with the c()
function to combine vectors, it is more or less ignored.
When c()
is used with no arguments, it produces the NULL
value.
For additional details, use:
Objects in R can have attributes. They are not part of the object, and often won’t be displayed when you print the object. You can generally think of attributes as metadata. The attributes()
function can be used to both modify and access the attributes of an object. So far, none of the objects that we’ve created have had attributes.
An object without attributes will return NULL
.
The above shows how to set an arbitrary attribute. Doing so requires using a list, which we haven’t discussed yet.
Some attributes are special and have reserved meaning. One of these attributes, class, we will return to in great detail later when we discuss the S3 class system for object oriented programming.
For now, we’ll discuss the names attribute.
In the case of atomic vectors, the names attribute can be used to assign a name to each of the elements of the vector. This can be done with the syntax above, but because it is such a common operation, shortcuts are available and we suggest you use them.
Names can be quickly added to an atomic vector in two ways:
names()
function.c()
function when creating a vector.Here we’ve created a vector, then added a name to each element. Notice that we specified the names using a character vector of the same length as the vector they are being added to.
When we print this vector now, the names will display above each element of the vector. We also no longer see the [1]
that we have become accustomed to seeing.
Remember though, these names are not part of the vector, they are simply metadata. If we perform an operation with this vector, the values will change but the names will remain.
If we want to check the names of a vector, use the names()
function.
Now let’s see an example of naming the elements of a vector as it is created.
Again, by doing so, when we print this vector, it will display the names above each element.
When an atomic vector has names, in addition to accessing individual elements via their index, you can also access them by name.
For reasons that will become clearly after we discuss lists and subsetting, you can also do so with a double bracket.
Notice a minor difference in output. With a single bracket the name is retained, while with the double bracket it is not.14
Be careful when using this feature, as names are not required to be unique!
You’ve been warned!
Often, when R users refer to vectors, they implicitly mean atomic vectors, but as beginners, you should be very aware of this distinction and not make too many assumptions.↩︎
If you need to use complex vectors, after understanding doubles, they will be mostly self-explanatory.↩︎
At some point, we’ll stop stating this and you’ll need to understand it from context.↩︎
This is unfortunately confusing because in everyday language we use character to refer to a single character of text.↩︎
What kind of name is foo
? Well, honestly just a placeholder because coming up with quick names is hard.↩︎
The output here is of course also a character vector, but we’ll stop before we get trapped in an infinite loop.↩︎
Remember, we need floating point arithmetic do deal with real numbers which might have infinite precision while using a computer with finite memory. For our purposes, we’ll largely ignore the details of how this works, and R will just deal with it for us. If you’re interested in understanding more about floating point arithmetic, consider reading “What Every Programmer Should Know About Floating-Point Arithmetic”.↩︎
Why is it called double? Because it uses double precision.↩︎
Why bother to create this distinction at all? Why not simply only use double vectors. Long story short: floating-point arithmetic.↩︎
Why L
? Because R uses long integers. The R language definition is actually silent on this etymology.↩︎
These results depend on the size of the window the printing takes place it, so your results may vary.↩︎
Many other languages are 0-indexed. While computer scientist have good reason to believe 0-indexing is superior, when designing a language for statistical computing, which relies heavily on linear algebra, 1-indexing is a natural choice. Have you ever heard of the 0th row of a matrix?↩︎
Perhaps most often character(0)
.↩︎
Later, we’ll note this is the difference between a preserving or simplifying subset.↩︎