AI / ML/ Deep Learning / Neural Network / Algorithms Intro
A brief introduction to Artificial Intelligence / Machine Learning / Deep Learning / Neural-Networks and thieir Algorithms. # Artificial Intelligence AI is intelligent behavior by machines, that means any device that can perceive its environment and take actions accordingly, has AI.

AI relies on something called knowledge engineering.

Machine Learning also called ML and Deep Learning also called DL, are really subsets of AI. You can create an AI system with the help of ML and DL algorithms.

This process of feeding data to a software program and coming up with human-like decisions is also known as the modeling process.

The model, which is basically your software algorithm is consistently refined until its decisions are close to those a human would come up with.


Machine Learning

What is Machine Learning?

Machine learning is also a process where machines take data, analyze it to generate predictions, and use those predictions to make decisions. Those predictions generate results, and those results are used to improve future predictions.

Machine Learning can make predictions from huge datasets, optimize utility functions, and extract hidden patterns and structures from the datasets by classifying data.

Think of machine learning as a combination of methods and systems. These methods in systems predict new data based on observed data, extract hidden structure from the data, summarize data into concise descriptions, optimize an action given a cost function and observed data, and adapt based on observed data.

Machine Learning starts with the data it already has about a situation. It processes data using algorithms to recognize patterns of a behavior and outcomes, it then interprets those patterns to predict the future outcomes. These predictions are used to make a decision about the next step for the Machine Learning to take. That decision produces results, which are then evaluated and added into the pool of data, the new data would influence the predictions and subsequent decisions made going forward, this is how Machine Learning learns over time.

Machine Learning Use Cases:

  • Detect fraudulent transactions
  • Filter spam emails
  • Flag suspicious reviews
  • Personalize content for users by recommending content and predictive content loading.
  • Targeted marketing, matching customers with offers they might like, choosing marketing campaigns, and cross-selling or upselling items
  • Automate categorisation of documents such as matching hiring managers and resumes by learning to understand written content
  • Customer service to provide predictive routing of customer emails based on the content and the sender
  • Social media listening capabilities

Types of Machine Learning Problems:

  • Supervised learning: The inputs to the model including the example outputs also known as labels, are known and the model learns to generalize the outputs from these examples
    • Classification
    • Regression
  • Unsupervised learning: The labels aren’t known. The model finds patterns in structure from the data without any help. (Self-organization)
    • Clustering: Discovers groupings in the data
      • Like grouping customers based on their purchasing behavior
    • Association: Discovers rules that govern large chunks of data
      • Customers who buy product A, also tend to buy product B
  • Reinforcement learning: The model learns by interacting with its environment and learns to take action to maximize the total reward.
    • This type of machine learning algorithm is inspired by behavioral psychology

Machine Learning platforms:

  • Amazon Machine Learning
    • The platform services remove the undifferentiated overhead associated with deploying and managing infrastructure for training and hosting
    • For customers who want to fully manage platform for building models using their own data
    • Designed for developers and data scientists who want to focus on building models.
    • The Platform removes the undifferentiated overhead associated with deploying and managing infrastructure for training and hosting models.
    • It can analyze your data, provide you with suggested transformations for the data, train your model, and even help you with evaluating your model for accuracy.
  • Amazon EMR (Amazon Elastic MapReduce)
    • A flexible, customizable, and manage big data processing platform.
    • A manage solution in that it can handle things like scaling and high availability for you.
    • Does not require a deep understanding of how to set up and administer Big Data Platforms, you get a preconfigured cluster ready to receive your analytics workload.
    • Built for any Data Science Workload not just AI.
  • Apache Spark
    • An open-source, distributed processing system commonly used for Big Data workloads.
    • Utilizes in-memory caching and optimize execution for fast performance
    • Supports general batch processing, Streaming Analytics, Machine Learning, graph database, and ad hoc queries. It can be run and managed on Amazon EMR clusters.

Amazon ML Supported preditions:

  • Binary classification
  • Multiclass classification
  • Regression

Building smart applications with Amazon ML steps:

  • Train a model: Need to create a data source object pointing to your data, explore and understand your data, and transform data and train your model.
  • Evaluate and optimize the model: Need to understand model quality and adjust model interpretation.
  • Retrieve batch and real-time predictions.

Deep Learning

What is Deep Learning?

Rather than telling the machine what features it needs to look for, Deep Learning enables the machine to define the features it needs to look for itself based on the data it’s being provided.

Deep-learning uses layers of non-linear Processing Units for features extraction and transformation. Each successive layer uses the output from the previous layer as an input.

The algorithms may be supervised or unsupervised and applications include pattern analysis, which is unsupervised, and classification which could be supervised or unsupervised.

These algorithms are also based on the unsupervised learning of multiple levels of features or representations of the data.

Higher-level features are derived from low-level features to form a hierarchical representation. Deep learning algorithms are part of a broader Machine Learning field of learning representations of data, and they learn multiple levels of representations that correspond to different levels of abstraction.

Where traditional Machine Learning focuses on feature engineering, Deep Learning focuses on end-to-end learning based on raw features.

Doing predictions, also known as inference.

Use cases for Deep Learning

  • Text analysis: use cases in a finance, social, CRM, and insurance domains to name a few. It’s used to detect insider training. Check for regulatory compliance. Brand affinity. Sentiment analysis. Intent Analysis, and more by essentially analyzing blobs of text.
  • Solve problems around time-series and predictive analysis: It’s using datacenters for Log Analysis and risk fraud detection, by the supply chain industry for resource planning, and in the IoT field for predictive analysis using sensor data. It’s also used in social media, and e-commerce for building recommendation engines.
  • Sound analysis: Deep Learning being used in the security domain for voice recognition, voice analysis, and in the CRM domain for sentiment analysis.
  • Automotive and aviation industries: where it’s used for Engine and instrument floor detection. You’ll even find deep learning in the finance industry for credit card fraud detection among other things.
  • Image analysis. In the security domain, Deep Learning is used for things like facial recognition. In social media, it’s used for tagging and identifying people in pictures.

Artificial Neural Networks (ANN)

Deep Learning processes information using similar but artificial processing structures known as artificial neural networks. It builds these structures from the data it analyzes, and then infers features about its subject matter based on the data. Then it weighs those features according to certainty and commonality, and organizes them into layers of hierarchies and relationships with each order.
With enough training data, a Neural Network can perform a decent job of mapping input data and features to output decisions.

A Neural network is a collection of simple trainable mathematical unit that collectively learn complex functions.

It consists of multiple layers. There is an input layer, some hidden layers, and an output layer.
Each layer is responsible for analyzing additional complex features in the data.

Artificial Neuron / Node

The basic unit of an Artificial Neural Network is Artificial Neuron sometimes also called a node.

In this simplified example, the input values are multiplied by the weight to get their weighted value.
Then if appropriate, the node adds an offset vector to the sum called the bias, which adjusts to sum to generate more accurate predictions, based on the success or failure of the product predictions.
Activation step: Once the inputs have been weighted then sumed, and the bias is added if appropriate, the neuron activates if the final value produced by the preceding steps meets or exceeds the determined activation threshold. That’s called the activation step. It’s the final step before an output is deliver.


The simplest neural network is a perceptron.
Perceptron is a single layer neural network that uses a list of input features.

  • Input features
  • Bias term, which is like an intercept in your linear regression models.
  • Activation function: This activation function is usually non-linear, and really depends on the problem you’re trying to solve.

Types of Neural Networks:

Feedforward Neural Network

Any neural network that doesn’t form a cycle between neurons. This means data moves from one input to output without looping backward.

Convolutional Neural Networks.

  • Input: Either an image or a sequence image that has waited.
  • Convolutional layer: For an image, we’re using kernels as filters to extract local features. In the example shown here, we have the input image and we’re using filters to convolve with the image to create the next layer. Depending on how many filters we’re using, will have different layers or different channels in the output from the convolutional layer, one in this particular case.
  • Pooling layer. Once you have a particular output, you may want to reduce the size of bit. To do this, we can use max pooling or average pooling. We will reduce to just a single scalar by taking the maximum of the two by two or taking the average of the two-by-two. The pooling layer is virtually a dimension reduction process. Based on the application of convolutional neural networks, you really have a lot of layers and the number of dimensions is pretty high. We need to reduce the size of the data for better convergence. We can add a few different layers for the convolution neural networks, but at the end of the day, we are to convert the tensor into a vector and make it become a fully-connected layer.
  • Fully connected layer: Will be used to link to the output. The output is usually a particular category of the graph or the image that is contains. For example, the output from this image could be a digit zero.
  • By the training process, we have a lot of good labeled data.
  • By using convolutional neural networks, we can try to find out the best number of filters and the variance in the filters that will give us a near human-level accuracy for image recognition.

Recurrent Neural Network

  • For the feed-forward, neural network and the convolutional neural network, the input data is relatively independent. The neural network cannot model the dependent structure among different input observations.
  • But often for time, sharing data, or any other natural language processing or translation applications, the sequence of input data really means something. When the data involves sequential features or time sharing features, the recurrent neural network is the right way to go.
  • For example, Within the input to this recurrent neural network, there are a set of characters, but they do have meanings as a sequence. So each individual word, each individual character doesn’t mean much until we have a sequential relationship among them.
  • During the training process, information flow is not just in one direction. The information flow is actually reused in propagating through different nodes at different sequences. In the final result, the input layer and output layer are actually connected with this recurrent neural network.

Deep Learning on AWS:

  • Three advanced Deep Learning enabled managed API services:
    • Amazon Lex
      • A service for building conversational interfaces into any application using voice and text.
      • Provides that advanced Deep Learning functionalities of automatic speech recognition, for converting speech-to-text, and natural language understanding to recognize the intent of the input.
      • Lets you build applications with highly engaging user experiences and life-like conversational interactions.
    • Amazon Polly
      • A service that turns texts into lifelike speech. Allowing you to create applications that talk, and build entirely new categories of speech enabled products.
    • Amazon Rekognition
      • Image analysis
      • Detect objects, scenes, and faces, and images
      • Search and compare faces
      • Recognize celebrities
      • Identify inappropriate content
      • Recognitions API enables you to quickly add sophisticated deep learning-based visual search and image classification to your applications.
  • Amazon EC2 P3 instances
    • Provide powerful NVIDIA GPUs to accelerate computations, so that customers can train their models in a fraction of the time required by traditional CPUs.
  • Amazon EC2 C5
    • Compute optimize and aim for general-purpose instances. In addition to GPU based instances, are well-suited for running inferences with the training model.
    • After training
  • Amazon deep-learning AMIs (Amazon Machine Image)
    • Build custom models
    • Available for Amazon Linux and Ubuntu
    • Come pre-configured with Apache MXnet TensorFlow, the Microsoft Cognitive Toolkit Caffe, Caffe2, theano, torch, Pytorch and Keras.
    • Auto-populate Machine Learning software such as the Anaconda package for data science.
    • Enable you to quickly deploy and run any of these frameworks at scale.
    • Can help you get started quickly. They’re provisioned with many deep learning frameworks including tutorials that demonstrate proper installation, configuration, and model accuracy.
    • Install dependencies, track library versions, and validate code compatibility. With updates to the AMIs every month, you always have the latest versions of the engines in data science libraries.
    • Create managed, automatically scalable clusters of GPUs for training and inference at any scale. No additional charge for the deep learning AMIs. You only pay for the AWS resources that you need to store and run your applications.

Two ways to get started with AWS Deep Learning AMIs:

  • AWS marketplace: Deploy a deep-learning Compute Instance in one click
  • AWS Cloud Formation Deep Learning template: To train over multiple instances for a simple way to launch all of your resources quickly using the deep learning AMIs.


  • The backpropagation algorithm helps the model learn from his mistakes by leveraging the chain rule of derivatives.
  • The primary value of a recurrent neural network comes when processing sequential information such as text, or speech, or handwriting. Where the ability to predict the next word or later is vastly improve if you’re factoring in the words, or letters that come before it.
  • Recurrent Neural Networks became much more popular after 2007 when Long Short Term Memory or LSTM approaches revolutionized speech recognition programs. LSTM is now the basis for many of today’s most successful applications in the speech recognition domain, text-to-speech domain, and handwriting recognition domain.

Author: Yuzu
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.