Neural Networks

Neural Networks

Neural Networks #

  • A network of artificial neurons inspired by how neurons function in the human brain.
  • At its core - a mathematical model designed to process and learn from data.
  • Neural networks form the foundation of Deep Learning (involves training large and complex networks on vast amounts of data).

flowchart LR
 subgraph subGraph0["Input Layer"]
        I1(("Input 1"))
        I2(("Input 2"))
        I3(("Input 3"))
  end
 subgraph subGraph1["Hidden Layer"]
        H1(("Hidden 1"))
        H2(("Hidden 2"))
        H3(("Hidden 3"))
  end
 subgraph subGraph2["Output Layer"]
        O(("Output"))
  end
    I1 --> H1 & H2 & H3
    I2 --> H1 & H2 & H3
    I3 --> H1 & H2 & H3
    H1 --> O
    H2 --> O
    H3 --> O

    style I1 fill:#C8E6C9
    style I2 fill:#C8E6C9
    style I3 fill:#C8E6C9
    style H1 stroke:#2962FF,fill:#BBDEFB
    style H2 fill:#BBDEFB
    style H3 fill:#BBDEFB
    style O fill:#FFCDD2
    style subGraph0 stroke:none,fill:transparent
    style subGraph1 stroke:none,fill:transparent
    style subGraph2 stroke:none,fill:transparent

Structure of a Neural Network #

A typical neural network has three main layers:

Artificial Neuron and Perceptron

Artificial Neuron and Perceptron #

knowledge in neural networks is stored in connection weights, and learning means modifying those weights.


Biological Neuron #

A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.

Core components:

  • Dendrites: receive signals from other neurons
  • Cell body (soma): processes incoming signals
  • Axon: transmits the output signal
  • Synapses: connection points between neurons

Biological intuition:

  • many inputs arrive to one neuron
  • one neuron can connect out to many neurons
  • massive parallelism enables fast perception and recognition

Artificial Neuron #

An artificial neuron is a simplified computational model inspired by biological neurons.

Deep Learning

Deep Learning #

  • Subset of ML
  • focuses on algorithms inspired by the structure and function of the brain called Artificial Neural Networks.
  • A neural network with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network.
  • Allows systems to automatically learn hierarchical representations (features) from raw input, such as images, sound, or text.

Operational Steps for Neural Architectures #

StepPerceptron (Boolean/Logic)Linear Regression NetworkBinary Classification (Logistic)DFNN / MLP (Classification)
1. InputTake binary or discrete inputs \( x_1, \dots, x_n \)Take numerical features \( x \)Take numerical features \( x \)Take high-dimensional numerical or categorical features
2. Weighted SumSingle calculation: \( z = \sum (w_i x_i) + b \)Single calculation: \( \hat{y} = w_0 + w_1 x \)Single calculation: \( z = W x + b \)Multiple stages: \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \) for each layer \( l \)
3. ActivationStep Function: Output 1 if \( z \geq 0 \) , else 0Identity: The output remains \( z \) (no non-linear change)Sigmoid: Maps \( z \) to a probability between 0 and 1ReLU for hidden layers; Softmax/Sigmoid for the output layer
4. Loss / ErrorError = Target − OutputMean Squared Error (MSE): \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)Binary Cross-Entropy (BCE): penalises based on probability distanceBCE or Categorical Cross-Entropy for multiple classes
5. OptimisationUpdate weights only on misclassificationGradient Descent: compute gradients at initialization and update weightsBackpropagation: compute error signals \( \delta \) and gradients \( dW \)Backpropagation: recursive chain rule to update all hidden layer weights
6. OutputDiscrete Boolean value (0 or 1)Continuous numerical value (e.g., house prices)Single probability score or class labelA vector of probabilities for multiple classes


flowchart LR
    %% Input Layer
    subgraph subGraph0["Input Layer"]
        I1(("Input 1"))
        I2(("Input 2"))
        I3(("Input 3"))
    end

    %% Hidden Layers
    subgraph subGraph1["Hidden Layer 1"]
        H1a(("H1-1"))
        H1b(("H1-2"))
        H1c(("H1-3"))
    end

    subgraph subGraph2["Hidden Layer 2"]
        H2a(("H2-1"))
        H2b(("H2-2"))
        H2c(("H2-3"))
    end

    subgraph subGraph3["Hidden Layer 3"]
        H3a(("H3-1"))
        H3b(("H3-2"))
        H3c(("H3-3"))
    end

    %% Output Layer
    subgraph subGraph4["Output Layer"]
        O(("Output"))
    end

    %% Connections: Input to Hidden Layer 1
    I1 --> H1a & H1b & H1c
    I2 --> H1a & H1b & H1c
    I3 --> H1a & H1b & H1c

    %% Connections: Hidden Layer 1 to Hidden Layer 2
    H1a --> H2a & H2b & H2c
    H1b --> H2a & H2b & H2c
    H1c --> H2a & H2b & H2c

    %% Connections: Hidden Layer 2 to Hidden Layer 3
    H2a --> H3a & H3b & H3c
    H2b --> H3a & H3b & H3c
    H2c --> H3a & H3b & H3c

    %% Connections: Hidden Layer 3 to Output
    H3a --> O
    H3b --> O
    H3c --> O

    %% Styling
    style I1 fill:#C8E6C9
    style I2 fill:#C8E6C9
    style I3 fill:#C8E6C9
    style H1a fill:#BBDEFB
    style H1b fill:#BBDEFB
    style H1c fill:#BBDEFB
    style H2a fill:#90CAF9
    style H2b fill:#90CAF9
    style H2c fill:#90CAF9
    style H3a fill:#64B5F6
    style H3b fill:#64B5F6
    style H3c fill:#64B5F6
    style O fill:#FFCDD2
    style subGraph0 stroke:none,fill:transparent
    style subGraph1 stroke:none,fill:transparent
    style subGraph2 stroke:none,fill:transparent
    style subGraph3 stroke:none,fill:transparent
    style subGraph4 stroke:none,fill:transparent

Types of Neural Networks #

  • Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate
  • CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection)
  • RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation)
  • Hybrid NN (e.g. Autonoumous Driving)

Components of DL #

  • Data
  • Learning Algorithm : How to transform data
  • Loss Function: Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.
  • Optimnisation Algorithm: in order to adjust the loss function, Learning Algorithm will try to optimize our algorithm. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent.
  • Model

Operational Steps for Neural Architectures #

StepPerceptron (Boolean/Logic)Linear Regression NetworkBinary Classification (Logistic)DFNN / MLP (Classification)
1. InputBinary/discrete inputs \( x_1, \dots, x_n \)Numerical features \( x \)Numerical features \( x \)High-dimensional numerical or categorical features
2. Weighted Sum\( z = \sum (w_i x_i) + b \)\( \hat{y} = w_0 + w_1 x \)\( z = W x + b \)\( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
3. ActivationStep: 1 if \( z \geq 0 \) , else 0Identity: output = \( z \)Sigmoid: maps \( z \) to probabilityReLU (hidden), Softmax/Sigmoid (output)
4. Loss / ErrorError = Target − Output\( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)Binary Cross-Entropy (BCE)BCE or Categorical Cross-Entropy
5. OptimisationUpdate on misclassificationGradient DescentBackpropagation (single layer)Backpropagation (multi-layer chain rule)
6. OutputBoolean (0 or 1)Continuous valueProbability scoreProbability vector (multi-class)

Applications #

  • Computer Vision (e.g., face detection, medical imaging)
  • Natural Language Processing (e.g., ChatGPT, translation)
  • Self Driving Cars
  • Speech Assistants (e.g., Alexa, Siri)

Intution #

Deep Learning is the methodology, DNN is a model.

Attention Mechanism

Attention Mechanism #

  • Queries, Keys, and Values
  • Attention Pooling by Similarity
  • Attention Pooling via Nadaraya–Watson Regression
  • Attention Scoring Functions
  • Dot Product Attention
  • Convenience Functions
  • Scaled Dot Product Attention
  • Additive Attention
  • Bahdanau Attention Mechanism
  • Multi-Head Attention
  • Self-Attention
  • Positional Encoding
  • Code implementation (webinar)

Reference #

  • Dive into deep learning. Cambridge University Press.. (Ch 10, Ch7

Home | Deep Learning

Optimisation of Deep models

Optimisation of Deep models #

  • Goal of Optimization
  • Optimization Challenges in Deep Learning
  • Gradient Descent
  • Stochastic Gradient Descent
  • Minibatch Stochastic Gradient Descent
  • Momentum
  • Adagrad and Algorithm
  • RMSProp and Algorithm
  • Adadelta and Algorithm
  • Adam and Algorithm
  • Code Implementation and comparison of algorithms (webinar)

Reference #

  • Dive into deep learning. Cambridge University Press.. (Ch12)

Home | Deep Learning

Regularisation for Deep models

Regularisation for Deep models #

  • Generalization for regression
  • Training Error and Generalization Error
  • Underfitting or Overfitting
  • Model Selection
  • Weight Decay and Norms
  • Generalization in Classification
  • Environment and Distribution Shift
  • Generalization in Deep Learning
  • Dropout
  • Batch Normalization
  • Layer Normalization
  • Code implementation (webinar)

Reference #


Home | Deep Learning