Jay Bhalodiya: Data Types and Objects in R

Data are the most basic ingredients used in "data analysis". R supports a wide variety of data types including scalars, vectors, matrices, data frames, and lists. In this tutorial, we will go over some commonly used data types and briefly cover the idea of "Object" in the end.

Scalars

In computer programming, scalar refers to an atomic quantity that can hold only one value at a time. Scalars are the most basic data types that can be used to construct more complex ones. Let's take a look of some common types of scalars with simple R commands.

Number

> x <- 1
> y <- 2.5
> class(x)
[1] "numeric"
> class(y)
[1] "numeric"
> class(x+y)
[1] "numeric"

Logical value

> m & n           # AND
[1] FALSE
> m | n           # OR
[1] TRUE
> !m              # Negation
[1] TRUE

Character(string)

> a <- "1"; b <- "2.5" # Are they different from x and y we used earlier?
> a;b
[1] "1"
[1] "2.5"
> a+b # a+b=3.5?
Error in a + b : non-numeric argument to binary operator
> class(a)
[1] "character"
> class(as.numeric(a)) # but you can coerce this character into a number
[1] "numeric"
> class(as.character(x)) # vice resa
[1] "character"

Vector

A vector is a sequence of data elements of the same basic type.

> o <- c(1,2,5.3,6,-2,4)                             # Numeric vector
> p <- c("one","two","three","four","five","six")    # Character vector
> q <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)            # Logical vector
> o;p;q
[1] 1.0 2.0 5.3 6.0 -2.0 4.0
[1] "one"   "two"   "three" "four" "five" "six"
[1] TRUE TRUE FALSE TRUE FALSE

We talked about component extraction briefly in our first tutorial. Here are some other fun ways of doing that.

> o[q] # Logical vector can be used to extract vector components
[1] 1 2 6 4
> names(o) <- p # Give each component a name
> o
one two three four five six
1.0 2.0 5.3 6.0 -2.0 4.0
> o["three"] # Extract your components by "calling" their names
three
5.3

Matrix

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. Same as vector, the components in a matrix must be of the same basic type. The following is an example of a matrix with 4 rows and 3 columns.

> t <- matrix(
+     1:12,                 # the data components (Don't type "+"!)
+     nrow=4,               # number of rows
+     ncol=3,               # number of columns
+     byrow = FALSE)        # fill matrix by columns
> t                         # print the matrix
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

Similar to vectors, matrices also use [] to reference elements.

> t[2,3]                    # component at 2nd row and 3rd column
[1] 10
> t[,3]                     # 3rd column of matrix
[1] 9 10 11 12
> t[4,]                     # 4th row of matrix
[1] 4 8 12
> t[2:4,1:3]                # rows 2,3,4 of columns 1,2,3
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    3    7   11
[3,]    4    8   12

Data Frame

A data frame is more general than a matrix, in that different columns can have different basic data types. Data frame is the most common data type we are going to use in this class.

> d <- c(1,2,3,4)
> e <- c("red", "white", "red", NA)
> f <- c(TRUE,TRUE,TRUE,FALSE)
> mydata <- data.frame(d,e,f)
> names(mydata) <- c("ID","Color","Passed")      # variable names
> mydata
ID Color Passed
1 1   red   TRUE
2 2 white   TRUE
3 3   red   TRUE
4 4 <NA> FALSE
Extracting components from data frames is somehow similar to what we did for matrices, but after assigning names to each column (variable), it becomes more flexible.

> mydata$ID                       # try mydata["ID"] or mydata[1]
[1] 1 2 3 4
> mydata$ID[3]                    # try mydata[3,"ID"] or mydata[3,1]
[1] 3
> mydata[1:2,] # first two records
ID Color Passed
1 1   red   TRUE
2 2 white   TRUE

List

A list is a generic vector containing other objects. There is no restriction on data types or length of the components. Usually, we work with lists that have named components.

> l <-list(vec=p, mat=t, fra=mydata, count=3) # a list with a vector, a matrix, a data frame defined earlier and a scalar
> l
$vec
[1] "one"   "two"   "three" "four" "five" "six"

$mat
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

$fra
ID Color Passed
1 1   red   TRUE
2 2 white   TRUE
3 3   red   TRUE
4 4 <NA> FALSE

$count
[1] 3
> l$vec # extract components from list
[1] "one"   "two"   "three" "four" "five" "six"
> l$mat[2,3]
[1] 10
> l$fra$Color
[1] red   white red   <NA>
Levels: red white

Object

In R, all types of data are treated as objects. However, objects are not simply collections of data. They are particular instances (instantiations) of particular classes. Operations, or functions, are defined for specific classes. Let's try working on something such as a point pattern.

# This time I will not show R outputs with codes. Just type or paste these lines into R and see what you get.
x <- rnorm(50, 10, 3)                 # creates 50 random x values from a normal distribution
y <- rnorm(50, 10, 4)                 # creates 50 random y values
mypoints <- as.data.frame(cbind(x,y)) # makes a data frame
class(mypoints)
mypoints
summary(mypoints)
plot(mypoints)                        # Gee, it looks like a point pattern...
box <- bbox(mypoints)                 # Type in library(splancs) first. Bounding Box - did this work? Why not?

It seems that most functions above work well with this data frame but "bbox" does not. See help(bbox). It didn't work because "bbox" doesn't work on objects of class data.frame. "bbox" operates on objects of class points (or a matrix of x and y values). Therefore you need to change the class accordingly. The following four approaches all work (try each one separately):

box <- bbox(cbind(x,y))
box <- bbox(as.matrix(mypoints))
box <- bbox(as.points(x,y))
box <- bbox(as.points(mypoints))

Jay Bhalodiya

Saturday, 6 April 2019

Data Types and Objects in R

Scalars

Number

Logical value

Character(string)

Vector

Matrix

Data Frame

List

Object

No comments:

Post a Comment

Awesome Deep Learning

Search This Blog