MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8
Abstract
A new benchmark MIGRATION-BENCH evaluates large language models on Java code migration tasks, providing a dataset and framework for repository-level migration assessment.
With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on problem-solving and issue-resolution tasks. In contrast, we introduce a new coding benchmark MIGRATION-BENCH with a distinct focus: code migration. MIGRATION-BENCH aims to serve as a comprehensive benchmark for migration from Java 8 to the latest long-term support (LTS) versions (Java 17, 21), MIGRATION-BENCH includes a full dataset and its subset selected with 5,102 and 300 repositories respectively. Selected is a representative subset curated for complexity and difficulty, offering a versatile resource to support research in the field of code migration. Additionally, we provide a comprehensive evaluation framework to facilitate rigorous and standardized assessment of LLMs on this challenging task. We further propose SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration to Java 17. For the selected subset with Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.00% success rate (pass@1) for minimal and maximal migration respectively. The benchmark dataset and source code are available at: https://huggingface.co/collections/AmazonScience and https://github.com/amazon-science/self_debug respectively.
Community
We introduce 🤗 MigrationBench dataset, a benchmark dataset tailored for repository-level code migration, specifically targeting java
8 to 17 or other long-term support versions.
1. Dataset
MigrationBench comprises a large-scale collection of GitHub repositories, organized into three subsets:
- 🤗
AmazonScience/migration-bench-java-full
contains5,102
repos- Each repo has a test directory or at least one test case
- 🤗
AmazonScience/migration-bench-java-selected
with300
repos- A curated subset of 🤗
migration-bench-java-full
- A curated subset of 🤗
- 🤗
AmazonScience/migration-bench-java-utg
has4,814
repos- The unit test generation (utg) dataset, disjoint with 🤗
migration-bench-java-full
- The unit test generation (utg) dataset, disjoint with 🤗
2. Evaluation Framework
To enable standardized and rigorous evaluation of LLM performance on this complex task, we provide a comprehensive open-source evaluation framework, available at: https://github.com/amazon-science/MigrationBench.
3. Baseline: Code Migration with LLMs
Inspired by Teaching Large Language Models to Self-Debug, we introduce SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration from java
8 to 17.
On the selected
subset using Claude-3.5-Sonnet-v2
, SD-Feedback achieves 62.33% and 27.33% success rate (pass@1
) for minimal and maximal migration respectively.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper