What are neural networks and how do they work?

Neural networks mimic the human brain, but how do they reason?

At its core, a neural network is a system of algorithms that attempts to recognise underlying relationships in a set of data through a process that mimics the way the human brain operates.

Neural networks are inspired by our brain's structure. Just as we have neurons, these networks have artificial neurons or "nodes". They consist of layers. The input layer (where data enters), one or more hidden layers, and the output layer (where we get the result).

While input and output layers are relatively easier to understand, let’s spend a moment on the hidden layer. What exactly does it mean and what role does it play?

The hidden layers in a neural network play a crucial role in the network's ability to learn and represent complex patterns and relationships in the data.

The input layer receives raw data, and as this data progresses through the hidden layers, it gets transformed into increasingly abstract representations. For instance, in image recognition the initial hidden layers might detect edges, the subsequent layers might recognise shapes by combining edges, and even further layers might recognize more complex structures like facial features.

Activation functions introduced in the hidden layers add non-linearity to the network. This non-linearity ensures that the network isn't just computing a linear transformation of the input data, allowing it to capture and learn from the complex, non-linear relationships in the data.

More hidden layers and more nodes within these layers increase the model's capacity, meaning it can represent more complex functions. As data moves through successive hidden layers, the network tends to learn a hierarchy of features. Lower layers often learn basic, foundational patterns, while deeper layers learn more abstract and composite features built upon the foundational ones.

When neural networks have many hidden layers, they're called deep neural networks. This is where the term "deep learning" comes from. These multiple hidden layers enable neural networks to learn and represent more complex, non-linear relationships from the data, making them more powerful tools for a wide range of tasks

But how do these layers communicate? 

The answer lies in nodes. Each node in a layer is connected to nodes in the next layer. These connections have weights, which are adjusted during learning. When data enters a node, it gets multiplied by a weight, summed up, and then passed through an activation function. This determines the node's output.

The magic happens during the learning phase. The chosen dataset is passed through the network forward and backwards. Each forward and backward passing motion is called one Epoch. For training,  multiple epochs are needed and between each Epoch the network adjusts the weights based on the error of its predictions using a process called backpropagation. 

What is backpropagation?

Well, It's like the network's way of admitting its mistakes. It adjusts weights backwards from the output to reduce the error for the next prediction.To help adjust this weight there is an optimisation algorithm called the Gradient Descent which helps adjust weights in the network. Think of it as a method to find the best set of weights to minimize prediction errors.

For a neural network to learn, data is crucial. The more quality data it gets, the better it becomes at making predictions. This however is a challenge because it can get quite costly and time-intensive to train a neural network. This can be offset by methods like batch updating where instead of updating weights after every data point, the model can be updated after a batch of data points.

We can also use transfer learning where instead of starting from scratch, we can use pre-trained models and fine-tune them for specific tasks. It saves time and resources.

Other challenges include overfitting. A common pitfall. It's when the network becomes too good at recognising the training data and performs poorly on new, unseen data, thereby rendering the model inefficient. This can be prevented by techniques like dropout which can be used to prevent overfitting. It involves randomly turning off certain nodes during training to make the network more robust.

In today’s world, there are two extremely popular neural networks.

  • Convolutional Neural Networks which are specialised for tasks like image recognition. They can identify patterns, shapes, and objects in images
  • Recurrent Neural Networks (RNNs) which are great for sequential data like time series or natural language. They have a memory of previous inputs.

With advancements in quantum computing and neuromorphic engineering, the future of neural networks is bound to be even more exciting!

With the explosion in Larger Language Models and conversational chatbots, like Chat GPT and Bard, neural networks will become more mainstream in customer service, image recognition and medical diagnostics, but as with all tech, it will be crucial to use them responsibly. Bias in data can lead to biased predictions, which can have real-world consequences.

Neural networks are a foundational tech in the realm of Artificial Intelligence. They mimic the human brain, learn from data, and are transforming the world in countless ways. That’s a basic primer which explains neural networks. Dive deeper, keep learning, and let's shape a responsible AI-driven future together.