Sunday, 16 June 2019

Median of Two Sorted Arrays

There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
You may assume nums1 and nums2 cannot be both empty.
Example 1:
nums1 = [1, 3]
nums2 = [2]

The median is 2.0
Example 2:
nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

Saturday, 15 June 2019

Learn_Data_Science_in_3_Months

21 Types of SQL Joins


154 Types of Data Visualization and their Usage

https://datavizproject.com/
For know how to Implement Data Visualization in Business ?https://lnkd.in/fYUCzgC For practical but less technical resource you can see links below 1. Know Data Science https://lnkd.in/fMHtxYP 2. Understand How to answer Why https://lnkd.in/f396Dqg 3. Know Machine Learning Key Terminology https://lnkd.in/fCihY9W 4. Understand Machine Learning Implementation https://lnkd.in/f5aUbBM 5. Machine Learning Applications on Marketing https://lnkd.in/fUDGAQW and Retail https://lnkd.in/fqBRjBx We're hiring data science intership https://lnkd.in/fbKPq3A

Big Data Life Cycle

In today’s big data context, the previous approaches are either incomplete or suboptimal. For example, the SEMMA methodology disregards completely data collection and preprocessing of different data sources. These stages normally constitute most of the work in a successful big data project.

A big data analytics cycle can be described by the following stage −

Business Problem Definition
 • Research
 • Human Resources Assessment
 • Data Acquisition
 • Data Munging
 • Data Storage
 • Exploratory Data Analysis 
 • Data Preparation for Modeling and Assessment 
 • Modeling 
 • Implementation

In this section, we will throw some light on each of these stages of big data life cycle.

Business Problem Definition 

This is a point common in traditional BI and big data analytics life cycle. Normally it is a nontrivial stage of a big data project to define the problem and evaluate correctly how much potential gain it may have for an organization. It seems obvious to mention this, but it has to be evaluated what are the expected gains and costs of the project.

Research 

Analyze what other companies have done in the same situation. This involves looking for solutions that are reasonable for your company, even though it involves adapting other solutions to the resources and requirements that your company has. In this stage, a methodology for the future stages should be defined.

Human Resources Assessment 

Once the problem is defined, it’s reasonable to continue analyzing if the current staff is able to complete the project successfully. Traditional BI teams might not be capable to deliver an optimal solution to all the stages, so it should be considered before starting the project if there is a need to outsource a part of the project or hire more people.

 Data Acquisition 

This section is key in a big data life cycle; it defines which type of profiles would be needed to deliver the resultant data product. Data gathering is a non-trivial step of the process; it normally involves gathering unstructured data from different sources. To give an example, it could involve writing a crawler to retrieve reviews from a website. This involves dealing with text, perhaps in different languages normally requiring a significant amount of time to be completed

Data Munging

 Once the data is retrieved, for example, from the web, it needs to be stored in an easyto-use format. To continue with the reviews examples, let’s assume the data is retrieved from different sites where each has a different display of the data. Suppose one data source gives reviews in terms of rating in stars, therefore it is possible to read this as a mapping for the response variable y ∈ {1, 2, 3, 4, 5}. Another data source gives reviews using two arrows system, one for up voting and the other for down voting. This would imply a response variable of the form y ∈ {positive, negative}. In order to combine both the data sources, a decision has to be made in order to make these two response representations equivalent. This can involve converting the first data source response representation to the second form, considering one star as negative and five stars as positive. This process often requires a large time allocation to be delivered with good quality.

Data Storage

 Once the data is processed, it sometimes needs to be stored in a database. Big data technologies offer plenty of alternatives regarding this point. The most common alternative is using the Hadoop File System for storage that provides users a limited version of SQL, known as HIVE Query Language. This allows most analytics task to be done in similar ways as would be done in traditional BI data warehouses, from the user perspective. Other storage options to be considered are MongoDB, Redis, and SPARK.

 This stage of the cycle is related to the human resources knowledge in terms of their abilities to implement different architectures. Modified versions of traditional data warehouses are still being used in large scale applications. For example, teradata and IBM offer SQL databases that can handle terabytes of data; open source solutions such as postgreSQL and MySQL are still being used for large scale applications.

Even though there are differences in how the different storages work in the background, from the client side, most solutions provide a SQL API. Hence having a good understanding of SQL is still a key skill to have for big data analytics.

This stage a priori seems to be the most important topic, in practice, this is not true. It is not even an essential stage. It is possible to implement a big data solution that would be working with real-time data, so in this case, we only need to gather data to develop the model and then implement it in real time. So there would not be a need to formally store the data at all.

 Exploratory Data Analysis 

Once the data has been cleaned and stored in a way that insights can be retrieved from it, the data exploration phase is mandatory. The objective of this stage is to understand the data, this is normally done with statistical techniques and also plotting the data. This is a good stage to evaluate whether the problem definition makes sense or is feasible.

Data Preparation for Modeling and Assessment 

This stage involves reshaping the cleaned data retrieved previously and using statistical preprocessing for missing values imputation, outlier detection, normalization, feature extraction and feature selection.

Modelling 

The prior stage should have produced several datasets for training and testing, for example, a predictive model. This stage involves trying different models and looking forward to solving the business problem at hand. In practice, it is normally desired that the model would give some insight into the business. Finally, the best model or combination of models is selected evaluating its performance on a left-out dataset.

 Implementation

 In this stage, the data product developed is implemented in the data pipeline of the company. This involves setting up a validation scheme while the data product is working, in order to track its performance. For example, in the case of implementing a predictive model, this stage would involve applying the model to new data and once the response is available, evaluate the model.

Friday, 14 June 2019

**** Official seaborn tutorial ****

Link - https://lnkd.in/fUrCDyq ๐ŸŽฅPopular Posts๐ŸŽฅ ➡Python for Data analysis - https://lnkd.in/fvKavA2 ➡Python Quick Reference sheet - https://lnkd.in/fa9wirz ➡Python Machine Learning Tutorial - https://lnkd.in/fteKpTY ➡All Cheat Sheets in one place - https://lnkd.in/fsmfZns ➡IBM FREE Cognitive classes for Data Science and Machine Learning - https://lnkd.in/fpp9QFT ➡Data Visualization - https://lnkd.in/f3FXCue ➡Getting Your First Data Science Job - https://lnkd.in/fQGHM2J ➡DATA SCIENCE INTERVIEW QUESTIONS - https://lnkd.in/feKZVhv ➡EXCEL EXPERT IN NO TIME - https://lnkd.in/fXC4dhj ➡10 Minutes to Pandas - https://lnkd.in/fpwaBCq ➡Numpy 100 Exercises - https://lnkd.in/fVX7Khk ➡Quick Reference Sheet (ML , DL & AI) - https://lnkd.in/fEVYMGD ➡Machine Learning Yearning By Andrew Ng - https://lnkd.in/f_E-_pf ➡MUST READ ARTICLES FOR DATA SCIENCE ENTHUSIAST - https://lnkd.in/fwPmurj ➡Coursera Deep Learning Course Notes - https://lnkd.in/fwQRK_G ➡Commonly used Machine Learning Algorithms - https://lnkd.in/f8msx2T ➡DATA SCIENCE LEARNING PATH FOR COMPLETE BEGINNER - https://lnkd.in/f6NJJk9


Wednesday, 12 June 2019

PapersWithCode is a great resource for Data Scientists. More than 9000 papers with large number of data-sets and code available.

Check it out here -------> (https://lnkd.in/fcGBFZh)

Here are the COMPLETE Lecture notes on Professor Andrew Ng's

Stanford Machine Learning Lecture: https://lnkd.in/gR5sRHg Then how to implement into business. After you good at machine learning you can implement on some cases? ✅ Step 1 Data Science Process https://lnkd.in/fMHtxYP ✅ Step 2 Data Visualization in Business https://lnkd.in/fYUCzgC ✅ Step 3 Understand How to answer Why https://lnkd.in/f396Dqg ✅ Step 4 Know Machine Learning Key Terminology https://lnkd.in/fCihY9W ✅ Step 5 Understand Machine Learning Implementation https://lnkd.in/f5aUbBM ✅ Step 6 Machine Learning Applications on Marketing https://lnkd.in/fUDGAQW ✅ Step 7Machine Learning Applications on Retail https://lnkd.in/fihPTJf

Awesome Deep Learning