Tuesday, 30 April 2019

Data Warehouse Tutorial – Learn Data Warehouse from Experts

BIG DATA HADOOP ***CHEAT SHEET***

Deep Learning Drizzle

Machine Learning (CS60050)

How to Learn anything ***Feynman Technique***



1.Pick a topic you want to understand and start studying it. Write down everything you know about the topic on a notebook page, and add to that page every time you learn something new about it. 2. Pretend to teach your topic to a classroom. Make sure you're able to explain the topic in simple terms. 3. Go back to the books when you get stuck. The gaps in your knowledge should be obvious. Revisit problem areas until you can explain the topic fully. 4. Simplify and use analogies. Repeat the process while simplifying your language and connecting facts with analogies to help strengthen your understanding. hashtagNeverStopLearning hashtagBestTechnique

Data Science (Cheat Sheets)

#datascience #data


Saturday, 6 April 2019

Big data Characteristics

Data Types and Objects in R

Data are the most basic ingredients used in "data analysis". R supports a wide variety of data types including scalars, vectors, matrices, data frames, and lists. In this tutorial, we will go over some commonly used data types and briefly cover the idea of "Object" in the end.

Scalars

In computer programming, scalar refers to an atomic quantity that can hold only one value at a time. Scalars are the most basic data types that can be used to construct more complex ones. Let's take a look of some common types of scalars with simple R commands.

Number

> x <- 1
> y <- 2.5
> class(x)
[1] "numeric"
> class(y)
[1] "numeric"
> class(x+y)
[1] "numeric"

Logical value

> m & n           # AND
[1] FALSE
> m | n           # OR
[1] TRUE
> !m              # Negation
[1] TRUE

Character(string)

> a <- "1"; b <- "2.5"       # Are they different from x and y we used earlier?
> a;b
[1] "1"
[1] "2.5"
> a+b                        # a+b=3.5?
Error in a + b : non-numeric argument to binary operator
> class(a)
[1] "character"
> class(as.numeric(a))       # but you can coerce this character into a number
[1] "numeric"
> class(as.character(x))     # vice resa

[1] "character"

Vector

A vector is a sequence of data elements of the same basic type.
> o <- c(1,2,5.3,6,-2,4)                             # Numeric vector
> p <- c("one","two","three","four","five","six")    # Character vector
> q <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)            # Logical vector
> o;p;q
[1]  1.0  2.0  5.3  6.0 -2.0  4.0
[1] "one"   "two"   "three" "four"  "five"  "six"
[1]  TRUE  TRUE FALSE  TRUE FALSE

We talked about component extraction briefly in our first tutorial. Here are some other fun ways of doing that.

> o[q]                                               # Logical vector can be used to extract vector components 
[1] 1 2 6 4
> names(o) <- p                                      # Give each component a name
> o
  one   two three  four  five   six 
  1.0   2.0   5.3   6.0  -2.0   4.0 
> o["three"]                                         # Extract your components by "calling" their names
three 
  5.3

Matrix

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. Same as vector, the components in a matrix must be of the same basic type. The following is an example of a matrix with 4 rows and 3 columns.
> t <- matrix(
+     1:12,                 # the data components (Don't type "+"!)
+     nrow=4,               # number of rows
+     ncol=3,               # number of columns
+     byrow = FALSE)        # fill matrix by columns
> t                         # print the matrix
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

Similar to vectors, matrices also use [] to reference elements.

> t[2,3]                    # component at 2nd row and 3rd column
[1] 10
> t[,3]                     # 3rd column of matrix
[1]  9 10 11 12
> t[4,]                     # 4th row of matrix
[1]  4  8 12
> t[2:4,1:3]                # rows 2,3,4 of columns 1,2,3
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    3    7   11
[3,]    4    8   12

Data Frame

A data frame is more general than a matrix, in that different columns can have different basic data types. Data frame is the most common data type we are going to use in this class.

> d <- c(1,2,3,4)
> e <- c("red", "white", "red", NA)
> f <- c(TRUE,TRUE,TRUE,FALSE)
> mydata <- data.frame(d,e,f)
> names(mydata) <- c("ID","Color","Passed")      # variable names
> mydata
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE
3  3   red   TRUE
4  4  <NA>  FALSE
Extracting components from data frames is somehow similar to what we did for matrices, but after assigning names to each column (variable), it becomes more flexible.

> mydata$ID                       # try mydata["ID"] or mydata[1]
[1] 1 2 3 4
> mydata$ID[3]                    # try mydata[3,"ID"] or mydata[3,1]
[1] 3
> mydata[1:2,]                    # first two records
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE

List

A list is a generic vector containing other objects. There is no restriction on data types or length of the components. Usually, we work with lists that have named components.

> l <-list(vec=p, mat=t, fra=mydata, count=3)                   # a list with a vector, a matrix, a data frame defined earlier and a scalar
> l
$vec
[1] "one"   "two"   "three" "four"  "five"  "six"  

$mat
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

$fra
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE
3  3   red   TRUE
4  4  <NA>  FALSE

$count
[1] 3
> l$vec                                                         # extract components from list
[1] "one"   "two"   "three" "four"  "five"  "six"  
> l$mat[2,3]
[1] 10
> l$fra$Color
[1] red   white red   <NA> 
Levels: red white

Object

In R, all types of data are treated as objects. However, objects are not simply collections of data. They are particular instances (instantiations) of particular classes. Operations, or functions, are defined for specific classes. Let's try working on something such as a point pattern. 

# This time I will not show R outputs with codes. Just type or paste these lines into R and see what you get.
x <- rnorm(50, 10, 3)                 # creates 50 random x values from a normal distribution
y <- rnorm(50, 10, 4)                 # creates 50 random y values
mypoints <- as.data.frame(cbind(x,y)) # makes a data frame
class(mypoints)
mypoints
summary(mypoints)
plot(mypoints)                        # Gee, it looks like a point pattern...
box <- bbox(mypoints)                 # Type in library(splancs) first. Bounding Box - did this work? Why not?

It seems that most functions above work well with this data frame but "bbox" does not. See help(bbox). It didn't work because "bbox" doesn't work on objects of class data.frame. "bbox" operates on objects of class points (or a matrix of x and y values). Therefore you need to change the class accordingly. The following four approaches all work (try each one separately): 

box <- bbox(cbind(x,y))
 box <- bbox(as.matrix(mypoints))
 box <- bbox(as.points(x,y))
 box <- bbox(as.points(mypoints))



























Awesome Deep Learning