NeuroCodeBench: A New Standard for Neural Network Verification
Introducing a benchmark to verify neural networks coded in plain C.
― 5 min read
Table of Contents
In recent years, neural networks have become an important part of many systems, especially those that prioritize safety. However, these systems need to be sure that the neural networks within them work correctly. There are many techniques available for checking neural networks, but they still can't guarantee that there are no errors in how the software is built. This article introduces a new benchmark called NeuroCodeBench, which is aimed at helping verify neural network code written in plain C.
The Need for Verification
Safety-critical systems, such as those used in healthcare, automotive, and aerospace, must have strong reliability and safety guarantees. Problems or errors in neural networks can lead to serious accidents or failures, making it crucial to have methods to confirm that the software is functioning correctly. Presently, many existing techniques focus on checking higher-level concepts but may overlook specific details in implementation.
Neural networks are not created in a straightforward manner; they evolve over time through a process that often involves trial and error. This can make them susceptible to various issues, such as poor predictions when they encounter unfamiliar data, incorrect architecture, outdated libraries, or software errors. These vulnerabilities can remain hidden until the system is in use, which is concerning.
Challenges Faced by Current Techniques
Most strategies aimed at ensuring the accuracy of neural networks operate from a broad perspective, ignoring specific details like how numbers are represented in the code. While some methods exist to find bugs in the actual code, they usually rely on testing, which cannot prove correctness for every possible input. This lack of certainty raises alarms, especially for systems where safety is critical. Common software issues, such as issues with calculations or memory, can lead neural networks to give incorrect results or even damage the system they are running on.
Even though tools for software verification are available, there hasn’t been a comprehensive assessment of how well they work for neural network code. The largest verification competition, known as SV-COMP, features many C programs but doesn’t have a dedicated benchmark for neural networks or the math libraries they depend on.
Introducing NeuroCodeBench
NeuroCodeBench aims to address the gaps in current verification processes by providing a set of benchmarks specifically designed for neural network code in plain C. This benchmark is structured to test the abilities of current software verification tools without overwhelming them with overly complex examples. NeuroCodeBench includes various neural networks that come with known safety properties in categories such as:
- Math library functions
- Activation Functions
- Networks that correct errors
- Approximating transfer functions
- Estimating probabilities
- Reinforcement learning techniques
Some parts of this benchmark have been adapted from previous competitions, while others are entirely new.
The Components of NeuroCodeBench
Math Library
Neural networks commonly use functions for operations based on floating-point numbers. These operations are essential for the proper function of neural networks and often involve mathematical calculations from the math.h library. Some functions that are critical for neural networks include square roots, logarithms, and various trigonometric functions.
We created various tests to check if software verification tools can correctly handle calls to math functions. These tests focus on the behavior of output values, whether the functions behave consistently, and if they produce reliable derivatives.
Activation Functions
Activation functions play a pivotal role in introducing non-linearities into neural networks. The benchmark includes several popular activation functions, such as ReLU (Rectified Linear Unit), Softmax, and TanH. Each of these functions depends on certain mathematical operations. The goal is to ensure that the implementation of these functions is accurate to avoid errors in the overall network.
Error-Correcting Networks
Certain recurrent neural networks can act as error-correcting decoders. This means they can take in data and identify which parts may be incorrect, making adjustments as needed. In our benchmarks, we use Hopfield networks that have well-defined behaviors, focusing on reconstructing patterns based on given inputs and checking their ability to handle various scenarios.
Transfer Function Networks
In engineering applications, neural networks are often used to approximate how electrical components behave. Our benchmark mimics this process by defining a theoretical component with an oscillating transfer function. We then train various neural networks to accurately approximate this behavior. Verification involves measuring how closely their outputs match the expected results.
VNN-COMP Networks
The International Verification of Neural Networks Competition (VNN-COMP) provides a variety of benchmarks but at a higher level of abstraction. To create a more relatable set of tests, we translated these networks from their original format into plain C code, ensuring that safety properties were preserved throughout the process.
Preliminary Evaluation of NeuroCodeBench
As part of assessing the effectiveness of NeuroCodeBench, we ran several top verification tools on the benchmark. After a set amount of time, we observed that these tools had a difficult time producing correct results across many safety properties. A major factor impacting their performance is the incomplete support for math functions, which was evident in many outputs. Additionally, some verification tools faced timeouts when dealing with more complex neural networks, indicating that larger instances posed a challenge.
Conclusions and Future Directions
NeuroCodeBench serves as a rigorous benchmark for verifying neural network code written in plain C. The initial findings indicate that current verification tools struggle to generate correct assessments for many of the safety properties tested. There is a clear need for better support for the math library functions that are commonly used in neural networks. As a next step, we plan to expand NeuroCodeBench and contribute it to the broader verification community, encouraging further exploration of challenges related to verifying neural networks. Through ongoing collaboration and dedicated efforts, we aim to improve the state of verification tools and provide safer and more reliable neural networks for critical applications.
Title: NeuroCodeBench: a plain C neural network benchmark for software verification
Abstract: Safety-critical systems with neural network components require strong guarantees. While existing neural network verification techniques have shown great progress towards this goal, they cannot prove the absence of software faults in the network implementation. This paper presents NeuroCodeBench - a verification benchmark for neural network code written in plain C. It contains 32 neural networks with 607 safety properties divided into 6 categories: maths library, activation functions, error-correcting networks, transfer function approximation, probability density estimation and reinforcement learning. Our preliminary evaluation shows that state-of-the-art software verifiers struggle to provide correct verdicts, due to their incomplete support of the standard C mathematical library and the complexity of larger neural networks.
Authors: Edoardo Manino, Rafael Sá Menezes, Fedor Shmarov, Lucas C. Cordeiro
Last Update: 2023-09-07 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2309.03617
Source PDF: https://arxiv.org/pdf/2309.03617
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.