Introducing InternLM-Law: A Model for Legal Queries

Table of Contents

Original Source
Reference Links

Large language models have shown they can do many things, but they have trouble with Legal questions because the law is complex and needs special knowledge. This article introduces InternLM-Law, a model created to help with various legal questions connected to Chinese laws, from basic legal questions to complicated real-life legal issues.

Building a Dataset

To create this model, we put together a large dataset with over a million legal queries. We developed a system to filter and process this data to make sure it covers a wide range of topics and is of high quality. Our Training used a new two-step method: at first, we trained the model with both legal and general content to give it broad knowledge, then we focused on quality legal data to help it produce better responses.

InternLM-Law showed it could perform better than leading models, such as GPT-4, on many legal tasks. We plan to share InternLM-Law and our dataset to help others research how to apply models in law.

Importance of Large Language Models

Large language models are becoming an important area of study in Natural Language Processing, attracting attention for their ability to apply to different fields. Some researchers are trying to use these models in areas like medicine, coding, and mathematics. They can help with specific problems and respond in natural language. In law, earlier studies have worked on creating models focused on specific tasks, but these often only provided limited legal advice and relied on older models that were not as effective.

There is still a strong need for a large model focused on the Chinese legal domain, which is what we aim to address with InternLM-Law.

Model Performance

Our model, called InternLM-Law-7B, received high scores across different legal tasks when evaluated. It performed better than GPT-4 and other large general models. We created a comprehensive training dataset from various public legal Datasets on the internet. This dataset includes both question-and-answer pairs and other information.

To make our model effective in legal tasks, we realized that just using legal data was not enough. We added general data to help the model apply its broader skills to legal issues. We also used a two-step training method to help the model learn important legal regulations and improve its response style.

Our Contributions

The main contributions of our work are:

We built InternLM-Law, a large language model made for the Chinese legal field. It can handle different kinds of legal tasks and set a new high standard on the LawBench Evaluation.
We invested a lot of time in creating and training our model. Our dataset has over 1 million samples and we used effective techniques to ensure its quality.
We used a two-step training pipeline, first training on both legal and general tasks, and then focusing on high-quality legal data.

Related Work in Legal AI

Legal Artificial Intelligence has been a topic in Natural Language Processing for a long time. Most previous studies focused on creating specialized tools for one particular task, which makes the legal system complicated. Some researchers are working to create large language models that can handle various legal tasks.

A few existing models have tried to focus on the legal domain specifically. For instance, SaulLM-7B is designed for legal text understanding, while models like Lawyer-LLaMA evolved to improve their consulting abilities through focused training on legal datasets. However, many of these models do not perform well in a variety of tasks, making our approach with InternLM-Law unique.

Training Process of InternLM-Law

We used InternLM2-Chat as the base for our model. The training included two stages. First, we trained on a mix of legal tasks and other general tasks. This phase helped the model gain a wider view of legal topics. Next, we refined the model with a focused legal training to enhance its legal knowledge, response structure, and accuracy in answering questions.

The training used powerful hardware for 8 hours, and we set it to handle long legal texts by allowing longer input lengths. We carefully set the learning rates and trained each stage thoroughly.

Data Sources for Training

Our dataset had two parts: legal and general data. The legal data aimed to cover a wide range of legal knowledge, split into categories like legal education materials, consultation records, and updated legal regulations. We sourced our legal data from various competitions and public legal databases.

To gather legal consultation data, we collected millions of records from online sources. These records contained numerous real-world legal issues where individuals sought help from legal practitioners. To ensure privacy, we anonymized all sensitive information.

The general data included a broad selection of topics such as everyday conversations, mathematical problems, and code generation, all processed to maintain quality and helpfulness.

Processing Legal Data

We developed a detailed plan to process our legal data, aiming to improve its quality. Since online legal consultations often included short and less detailed responses, we created a semi-automated method to expand and enhance these responses. We also observed that the data distribution was imbalanced, so we focused on crucial areas like laws and regulations to improve the legal dataset's quality.

Training Legal NLP Data

When dealing with legal tasks, we categorized them into different types using existing legal benchmarks. We then generated diverse and relevant instructions for each task to create a well-structured legal dataset.

Legal Consultation Data Processing

Our legal consultation dataset included various legal scenarios. We recognized that many of these contained unnecessary information that could harm data quality. To ensure reliability, we employed filtering methods to refine the dataset, discarding overly brief or unclear responses and maintaining quality throughout.

Processing Legal Regulations

For legal regulations, we turned pure text data into question-and-answer pairs for training. By transforming titles of laws or regulations into questions, we helped the model retain relevant legal knowledge effectively.

High-Quality Legal Data Processing

To make our model's legal knowledge more precise, we used GPT-4 to semi-automate the generation of high-quality Q&A datasets. We manually checked and adjusted the generated content for accuracy.

Data Synthesis and Resampling

Since responses written by humans can differ in style and detail, we created additional data using GPT-4, refining it with human feedback. We sampled critical legal content, focusing on frequently occurring legal issues to bring clarity and improve accuracy in the model's responses.

Comparing Our Model

We compared InternLM-Law with other leading models, both general and legal-specific, including the high-performing GPT-4. The evaluation showed that our model surpassed the others, particularly in legal tasks on the LawBench benchmark, which tests the model's memorization, comprehension, and application of legal knowledge.

Objective and Subjective Evaluation

Alongside the benchmark evaluation, we assessed how our model performed on subjective legal questions, reflecting real-world legal consultations. Our model achieved an impressive win rate against GPT-4 in legal consultation tasks.

Long Context Evaluation

Handling long legal documents is often necessary. We tested our model's ability to understand and answer questions based on lengthy legal judgments. Other models struggled with this type of task, while InternLM-Law effectively processed long texts and successfully answered related questions.

Effectiveness of Training Strategies

We explored how using general datasets during training impacted both legal and general tasks. Our findings indicated that including general data not only preserved the model's general capabilities but also enhanced its legal skills.

Conclusion

InternLM-Law is a significant advancement in the Chinese legal domain, outperforming existing models while providing a robust framework for future legal AI applications. Despite its success, the model still faces challenges, such as occasional inaccuracies, emphasizing the need for further improvement in handling complex legal reasoning tasks.

Introducing InternLM-Law: A Model for Legal Queries

InternLM-Law enhances responses to diverse Chinese legal questions with advanced training.

Building a Dataset

Importance of Large Language Models

Model Performance

Our Contributions

Related Work in Legal AI

Training Process of InternLM-Law

Data Sources for Training

Processing Legal Data

Training Legal NLP Data

Legal Consultation Data Processing

Processing Legal Regulations

High-Quality Legal Data Processing

Data Synthesis and Resampling

Comparing Our Model

Objective and Subjective Evaluation

Long Context Evaluation

Effectiveness of Training Strategies

Conclusion

Reference Links

Referenced Topics

Introducing InternLM-Law: A Model for Legal Queries

InternLM-Law enhances responses to diverse Chinese legal questions with advanced training.

#Building a Dataset

#Importance of Large Language Models

#Model Performance

#Our Contributions

#Related Work in Legal AI

#Training Process of InternLM-Law

#Data Sources for Training

#Processing Legal Data

#Training Legal NLP Data

#Legal Consultation Data Processing

#Processing Legal Regulations

#High-Quality Legal Data Processing

#Data Synthesis and Resampling

#Comparing Our Model

#Objective and Subjective Evaluation

#Long Context Evaluation

#Effectiveness of Training Strategies

#Conclusion

Reference Links

Referenced Topics

Building a Dataset

Importance of Large Language Models

Model Performance

Our Contributions

Related Work in Legal AI

Training Process of InternLM-Law

Data Sources for Training

Processing Legal Data

Training Legal NLP Data

Legal Consultation Data Processing

Processing Legal Regulations

High-Quality Legal Data Processing

Data Synthesis and Resampling

Comparing Our Model

Objective and Subjective Evaluation

Long Context Evaluation

Effectiveness of Training Strategies

Conclusion