NeuroCodeBench: A New Standard for Neural Network Verification

Table of Contents

The Need for Verification
Challenges Faced by Current Techniques
Introducing NeuroCodeBench
The Components of NeuroCodeBench
Preliminary Evaluation of NeuroCodeBench
Conclusions and Future Directions
Original Source

In recent years, neural networks have become an important part of many systems, especially those that prioritize safety. However, these systems need to be sure that the neural networks within them work correctly. There are many techniques available for checking neural networks, but they still can't guarantee that there are no errors in how the software is built. This article introduces a new benchmark called NeuroCodeBench, which is aimed at helping verify neural network code written in plain C.

The Need for Verification

Safety-critical systems, such as those used in healthcare, automotive, and aerospace, must have strong reliability and safety guarantees. Problems or errors in neural networks can lead to serious accidents or failures, making it crucial to have methods to confirm that the software is functioning correctly. Presently, many existing techniques focus on checking higher-level concepts but may overlook specific details in implementation.

Neural networks are not created in a straightforward manner; they evolve over time through a process that often involves trial and error. This can make them susceptible to various issues, such as poor predictions when they encounter unfamiliar data, incorrect architecture, outdated libraries, or software errors. These vulnerabilities can remain hidden until the system is in use, which is concerning.

Challenges Faced by Current Techniques

Most strategies aimed at ensuring the accuracy of neural networks operate from a broad perspective, ignoring specific details like how numbers are represented in the code. While some methods exist to find bugs in the actual code, they usually rely on testing, which cannot prove correctness for every possible input. This lack of certainty raises alarms, especially for systems where safety is critical. Common software issues, such as issues with calculations or memory, can lead neural networks to give incorrect results or even damage the system they are running on.

Even though tools for software verification are available, there hasn’t been a comprehensive assessment of how well they work for neural network code. The largest verification competition, known as SV-COMP, features many C programs but doesn’t have a dedicated benchmark for neural networks or the math libraries they depend on.

Introducing NeuroCodeBench

NeuroCodeBench aims to address the gaps in current verification processes by providing a set of benchmarks specifically designed for neural network code in plain C. This benchmark is structured to test the abilities of current software verification tools without overwhelming them with overly complex examples. NeuroCodeBench includes various neural networks that come with known safety properties in categories such as:

Math library functions
Activation Functions
Networks that correct errors
Approximating transfer functions
Estimating probabilities
Reinforcement learning techniques

Some parts of this benchmark have been adapted from previous competitions, while others are entirely new.

The Components of NeuroCodeBench

Math Library

Neural networks commonly use functions for operations based on floating-point numbers. These operations are essential for the proper function of neural networks and often involve mathematical calculations from the math.h library. Some functions that are critical for neural networks include square roots, logarithms, and various trigonometric functions.

We created various tests to check if software verification tools can correctly handle calls to math functions. These tests focus on the behavior of output values, whether the functions behave consistently, and if they produce reliable derivatives.

Activation Functions

Activation functions play a pivotal role in introducing non-linearities into neural networks. The benchmark includes several popular activation functions, such as ReLU (Rectified Linear Unit), Softmax, and TanH. Each of these functions depends on certain mathematical operations. The goal is to ensure that the implementation of these functions is accurate to avoid errors in the overall network.

Error-Correcting Networks

Certain recurrent neural networks can act as error-correcting decoders. This means they can take in data and identify which parts may be incorrect, making adjustments as needed. In our benchmarks, we use Hopfield networks that have well-defined behaviors, focusing on reconstructing patterns based on given inputs and checking their ability to handle various scenarios.

Transfer Function Networks

In engineering applications, neural networks are often used to approximate how electrical components behave. Our benchmark mimics this process by defining a theoretical component with an oscillating transfer function. We then train various neural networks to accurately approximate this behavior. Verification involves measuring how closely their outputs match the expected results.

VNN-COMP Networks

The International Verification of Neural Networks Competition (VNN-COMP) provides a variety of benchmarks but at a higher level of abstraction. To create a more relatable set of tests, we translated these networks from their original format into plain C code, ensuring that safety properties were preserved throughout the process.

Preliminary Evaluation of NeuroCodeBench

As part of assessing the effectiveness of NeuroCodeBench, we ran several top verification tools on the benchmark. After a set amount of time, we observed that these tools had a difficult time producing correct results across many safety properties. A major factor impacting their performance is the incomplete support for math functions, which was evident in many outputs. Additionally, some verification tools faced timeouts when dealing with more complex neural networks, indicating that larger instances posed a challenge.

Conclusions and Future Directions

NeuroCodeBench serves as a rigorous benchmark for verifying neural network code written in plain C. The initial findings indicate that current verification tools struggle to generate correct assessments for many of the safety properties tested. There is a clear need for better support for the math library functions that are commonly used in neural networks. As a next step, we plan to expand NeuroCodeBench and contribute it to the broader verification community, encouraging further exploration of challenges related to verifying neural networks. Through ongoing collaboration and dedicated efforts, we aim to improve the state of verification tools and provide safer and more reliable neural networks for critical applications.

NeuroCodeBench: A New Standard for Neural Network Verification

Introducing a benchmark to verify neural networks coded in plain C.

The Need for Verification

Challenges Faced by Current Techniques

Introducing NeuroCodeBench

The Components of NeuroCodeBench

Math Library

Activation Functions

Error-Correcting Networks

Transfer Function Networks

VNN-COMP Networks

Preliminary Evaluation of NeuroCodeBench

Conclusions and Future Directions

Referenced Topics

NeuroCodeBench: A New Standard for Neural Network Verification

Introducing a benchmark to verify neural networks coded in plain C.

#The Need for Verification

#Challenges Faced by Current Techniques

#Introducing NeuroCodeBench

#The Components of NeuroCodeBench

#Math Library

#Activation Functions

#Error-Correcting Networks

#Transfer Function Networks

#VNN-COMP Networks

#Preliminary Evaluation of NeuroCodeBench

#Conclusions and Future Directions

Referenced Topics

The Need for Verification

Challenges Faced by Current Techniques

Introducing NeuroCodeBench

The Components of NeuroCodeBench

Math Library

Activation Functions

Error-Correcting Networks

Transfer Function Networks

VNN-COMP Networks

Preliminary Evaluation of NeuroCodeBench

Conclusions and Future Directions