Data Types in R – A Quick Tutorial

There are a wide variety of data types in R. First we will try to understand what the different data types are and then we will move on to their applications.

So, let us get into the types of data.

If the data consists of only numbers, like decimals, whole numbers, then we call it NUMERIC DATA. In numeric data, the numbers can  be positive or negative.

If  the data consists only of whole numbers, it is called as INTEGER. Integers too may take negative or positive values.

If data consists of strings, i.e., words or sentences, we call it CHARACTER.

A vector used to store categorical data which contain only  predefined values is known as FACTOR. They can store both strings and integers.

The type of data which can only assume two values, namely, true  and false, is called as LOGICAL DATA

Data Types in R

Data Types in R

A Vector is an unidimensional sequence of elements of the same type, whereas, a Matrix is two dimensional. A matrix is similar to a Vector, but additionally contains the dimension attribute. An Array is of two or more dimensions, holding multidimensional data. Two dimensional Arrays are called Matrices.

A data frame has two dimensions and is a table-like representation of the data objects. We can have different data types in different columns.

Unlike vectors, a list can contain elements of various data types and is often known as an ordered collection of data objects.

Vectors and Dataframes

Numeric Data

This is an example of Numeric Data. We are simply creating two objects, here represented as “x” and “y”, and assigning them some values.

The function class() is used to check the data type of an object. x and y are of numeric data type.

 [1] 4.5 
 y<-3567
 y 
 [1] 3567 
 class(x) 
 [1] "numeric" 
 class(y) 
 [1] "numeric" 

Class() function is used to check the data type of an object

In the previous slide, we took the value of x to be 4.5, giving us Numeric Data. Here we use the function as.integer() to convert the Numeric Data to Integer. Now the data type of x is displayed as integer.

 x<-as.integer(x)
 x 
 [1] 4 
 class(x) 
 [1] "integer" 

Integer

The function as.integer() is used to create integer data type in R, as by default, R  shows the class of an Integer as Numeric.

#To create an integer variable in R use as.integer() function.

 f<-as.integer(22.5)
 f 
 [1] 22 
 class(f) 
 [1] "integer"
 x=8 
 class(x) 
 [1] "numeric" 

Note: The default class of an integer is a numeric class

Character

As said before, Character is used to represent String Data, i.e., words and sentences. String data can also comprise numbers, as any value enclosed in quotes is stored as Character Object. We can also convert any other form of data into Character by using the function as.character().

 z<-"Welcome to R  Ready Reckon-er"
 z 
 [1] "Welcome to R  Ready Reckon-er" 
 x<-"4.5"
 x 
 [1] "4.5" 
 class(z) 
 [1] "character" 
 class(x) 
 [1] "character"

Factor

Factor Objects can store both Strings and Integers, and is used to categorize data. They are especially useful when  they have a limited number of unique values. Here let us discuss three types of commands.

c() is used to combine different types of data.

is.factor() is a command used to check whether a particular object is a factor or not. It returns either  true or false.

is.character() is a command that is used to check whether a particular object is a Character or not. Just like is.factor(), this command  outputs could be  true or false.

# Create an object x 

 x<-c("high", "medium", "low", "low", "medium", "high", "high", "high", "medium", "low","low") 

c() combines data of different types

# Check whether object x is a factor or character

 is.factor(x) 
 [1] FALSE 

is.factor() function returns True or False after checking whether the object is of type factor or not

 is.character(x) 
 [1] TRUE 

is.character() function returns True or False after checking whether the object is of type character or not

To create a Factor Object, we use the command factor(). A Factor is a categorical variable and can only take one of a fixed finite set of possibilities. The possible categories are called Levels.

Levels are unique data values.

Using the command level() we can check the levels of a Factor. In the output, by default, the Levels are arranged alphabetically.

#Create a factor object using factor() function

 x<-factor(x)
 x 
 [1] high   medium low    low    medium high   high   high  
 [9] medium low    low   
 Levels: high low medium 
 levels(x) 
 [1] "high" "low" "medium" 

Factor object x has 11 elements and 3 levels. By default the levels are sorted alphabetically

The function ordered() is used to specify the order of a Factor.

The command levels takes the levels in the  way we want to order.

 x_ordered<-ordered(x, levels=c("low", "medium","high"))
 x_ordered 
 [1] high   medium low    low    medium high   high   high   medium low    low   
 Levels: low < medium < high 

Logical

Logical type objects take the values TRUE and FALSE.

The command is.integer() is used to check whether a particular object is an Integer or not. It has two possible outputs, TRUE and FALSE.

Also, R  can evaluate a logical question, i.e., whose answer will be either true or false and store it as an object.

# Create an object x and assign a value 4.5 and check whether it is an integer 

 x<-4.5
 is.integer(x) 
 [1] FALSE 

is.integer() function checks whether the object is integer or not

# Create two numeric objects y and z

# Check whether y is greater than x or not

 y<-4
 z<-7
 Result <- y > z
 Result 
 [1] FALSE 

With this kind of statement, you are asking R to evaluate the logical question “Is it true that y is greater than z?”

The object(Result) storing the answer of above question is of type logical

You can check the class of the object using class()

Vector

As mentioned before ,Vectors are unidimensional and contain data of similar type. There are three types of vector.

Numeric Vector consisting of Numeric Data.

Character Vector consisting of Character Data.

Logical Vectors are governed by statements, the result of which will be either TRUE or FALSE.

# Numeric vector

 a <- c(1,2,5.3,6,-2,4) 
 a 
 [1]  1.0  2.0  5.3  6.0 -2.0  4.0 

# Character vector

 b <- c("one","two","three")
 b 
 [1] "one"   "two"   "three" 

# Logical vector

 d<-c(4,24,6,4, 2,7)
 d>5 
 [1] FALSE TRUE TRUE FALSE FALSE TRUE 

Matrix

Matrix is bidimensional and contains dimensional attribute.

We can easily convert any object into Matrix type by using the function as.matrix().

matrix()  function is used to create a matrix.

The functions nrow and ncol are used to specify the number of rows and columns of the Matrix respectively.

While composing a Matrix, the function byrow=TRUE is used to fill the Matrix row-wise. By default the matrix is filled column wise

Create a matrix with 3 rows and 2 columns.

 x<-matrix(c(2, 3, 4, 5, 6, 7),nrow=3,ncol=2)
 x 
       [,1] [,2]
 [1,]    2    5
 [2,]    3    6 
 [3,]    4    7 

matrix() function is used to create a matrix.

nrow= and ncol= is used to specify the dimension of the matrix

Note that the matrix is filled in by column-wise.

 x<-matrix(c(2, 3, 4, 5, 6, 7),nrow=3,ncol=2,byrow=TRUE)
 x 
       [,1] [,2]
 [1,]    2    3
 [2,]    4    5
 [3,]    6    7

byrow=TRUE fills the matrix row-wise

The argument dimnames can be used to name the rows and columns of a Matrix. The dimension names can be changed and/or accessed by using the functions colnames and rownames.

 x<-matrix(c(2, 3, 4, 5, 6, 7),nrow=3,ncol=2,byrow=TRUE, 
           dimnames=list(c("X","Y","Z"), c("A","B")))
 x 
   A B
 X 2 3
 Y 4 5
 Z 6 7 
#Dimension names can be accessed or changed with two helpful functions colnames() and rownames(): 
 colnames(x)
 [1] "A" "B"
 rownames(x)
 [1] "X" "Y" "Z"
 colnames(x) <- c("a","b")
 colnames(x)
 [1] "a" "b" 

rownames can be changed in similar manner

Another useful method of composing a Matrix is by using the commands cbind() and rbind(). In the above example  cbind() will create a 3X2 matrix with the elements 2,3,4 in first column and 5,6,7 the second column.

rbind()  will create a matrix of order 3×2 filling the values row wise

 cbind(c(2,3,4),c(5,6,7))
 
 rbind(c(2,3),c(4,5), c(6,7)) 

Arrays

In an Array, each row  is of the same length and each column is also of the same length. So, we say it holds multidimensional rectangular data.

Creating an Array is simple. We use the command array(data, dim = c(r,c,t)), to create an array, where, “r” represents the number of rows of the Array, “c” represents the number of columns, and “t” represents the number of tables. By default R fills the array column-wise, even though the first dimension in our command is that of rows. So, first the columns are filled, then the rows, then the rest of the dimensions.

 a<-array(1:24,dim=c(3,4,2))
 a 
 , , 1
 
      [,1] [,2] [,3] [,4]
 [1,]    1    4    7   10
 [2,]    2    5    8   11
 [3,]    3    6    9   12
 
 , , 2
 
      [,1] [,2] [,3] [,4]
 [1,]   13   16   19   22
 [2,]   14   17   20   23
 [3,]   15   18   21   24 

array(data, dim = c(r,c,t) )

r = no. of rows

c = no. of columns

t = no. of tables

Note: Although the rows are given as the first dimension, the tables are filled column-wise. So, for arrays, R fills the columns, then the rows, and then the rest

Data Frames

Data Frame is a bidimensional data structure, similar to Arrays. It is, actually, a list of vectors of equal length. These are the primary structure in R.

We can convert any object into Data Frame type by using the function as.data.frame().

Suppose we have three vectors x,y,z. We can use the function data.frame() to combine these vectors to form a Data Frame.

As said before, Matrices are also bidimensional and is similar to vectors.

So what’s the difference between Data Frames and Matrices?

Data Frames can contain heterogenous data among its columns or variables, whereas Matrices contain only homogenous data.

 x<-c(12,23,45)      
 y<-c(13,21,6)       
 z<-c("a","b","c") 

creating vectors x, y, z

 data<-data.frame(x,y,z)                
 data  

data.frame() function combines them in a table.

object data is a dataframe containing three vectors x, y, z

   x  y z
 1 12 13 a
 2 23 21 b
 3 45  6 c

The function str() displays the structure of an object. By default, R transforms Character Vectors or Character Matrix to Factors while creating a Data Frame. To avoid errors with respect to this, we can specify stringsAsFactors=FALSE while creating the Data Frame.

 str(data) 
 'data.frame':  3 obs. of  3 variables:
  $ x: num  12 23 45
  $ y: num  13 21 6
  $ z: Factor w/ 3 levels "a","b","c": 1 2 3 

str() shows the structure of  an object

Note : z is a character vector but by default R stores it in the data frame as factor..

Lists

A data structure containing mixed data types is called a List. We can convert any object into a list by using the function as.list(). We can also create lists using the function list().

 n=c(2, 3, 5) 
 s=c("aa", "bb", "cc", "dd", "ee") 
 x=list(n, s, 3)
 x    
 [[1]]
 [1] 2 3 5
  
 [[2]]
 [1] "aa" "bb" "cc" "dd" "ee"
  
 [[3]]
 [1] 3 

list() is used to create lists,

A part of a list can easily be retrieved by enclosing the index vector in a square bracket operator []. The indexing of the list starts with [1]

 x[2]
 [[1]]
 [1] "aa" "bb" "cc" "dd" "ee" 
 x[c(2, 3)] 
 [[1]]
 [1] "aa" "bb" "cc" "dd" "ee"
 [[2]]
 [1] 3 

To recap, we’ve discussed the different data types in R and how to convert from one data type to another. This tutorial is based on lessons from the Data Analytics in R unit of the Digita Schools Advanced Diploma in Data Analytics and Postgraduate Diploma in Data Science

Data Types in R summary