Fully Connected Layer — Technology Wiki

Overview

Direct Answer

A fully connected layer is a neural network component in which each neuron receives input from all neurons in the preceding layer and transmits output to all neurons in the following layer. Also termed a dense layer, it forms a complete bipartite graph of connections between adjacent layers.

How It Works

Each neuron in the layer computes a weighted sum of all inputs from the prior layer, adds a bias term, and applies an activation function to produce its output. The weight matrix dimensionality is determined by the product of the input and output neuron counts, making computation cost scale quadratically with layer size. This architecture enables the network to learn arbitrary non-linear transformations by adjusting weights during backpropagation.

Why It Matters

Dense layers serve as the primary mechanism for learning complex feature representations and decision boundaries in neural networks. They are computationally efficient for feature extraction and classification tasks, directly impacting model accuracy and inference latency—critical factors in production systems handling real-time predictions and large-scale data processing.

Common Applications

Fully connected layers appear in image classification networks (following convolutional feature extraction), natural language processing models for text classification, recommendation systems, and time-series forecasting. They form the output layer in virtually all supervised learning neural networks.

Key Considerations

Fully connected layers introduce significant parameter overhead compared to convolutional or recurrent alternatives, increasing memory consumption and training time. They assume no spatial or temporal structure in data, making them less efficient than specialised layers for structured inputs such as images or sequences.

Cross-References(1)

Deep Learning

Neural Network

Related in Architectures

Deep Learning

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Transformer

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Attention Mechanism

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

More in Deep Learning

Generative Adversarial Network

Generative Models

A framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.

Pre-Training

Language Models

The initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.

Activation Function

Training & Optimisation

A mathematical function applied to neural network outputs to introduce non-linearity, enabling the learning of complex patterns.

Model Parallelism

Architectures

A distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.

Key-Value Cache

Architectures

An optimisation in autoregressive transformer inference that stores previously computed key and value tensors to avoid redundant computation during sequential token generation.

Graph Neural Network

Architectures

A neural network designed to operate on graph-structured data, learning representations of nodes, edges, and entire graphs.

Vanishing Gradient

Architectures

A problem in deep networks where gradients become extremely small during backpropagation, preventing earlier layers from learning.

ReLU

Training & Optimisation

Rectified Linear Unit — an activation function that outputs the input directly if positive, otherwise outputs zero.