Papers
arxiv:2505.09569

MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8

Published on May 14
· Submitted by sliuxl on May 21
Authors:
,
,
,
,
,
,
,

Abstract

A new benchmark MIGRATION-BENCH evaluates large language models on Java code migration tasks, providing a dataset and framework for repository-level migration assessment.

AI-generated summary

With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on problem-solving and issue-resolution tasks. In contrast, we introduce a new coding benchmark MIGRATION-BENCH with a distinct focus: code migration. MIGRATION-BENCH aims to serve as a comprehensive benchmark for migration from Java 8 to the latest long-term support (LTS) versions (Java 17, 21), MIGRATION-BENCH includes a full dataset and its subset selected with 5,102 and 300 repositories respectively. Selected is a representative subset curated for complexity and difficulty, offering a versatile resource to support research in the field of code migration. Additionally, we provide a comprehensive evaluation framework to facilitate rigorous and standardized assessment of LLMs on this challenging task. We further propose SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration to Java 17. For the selected subset with Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.00% success rate (pass@1) for minimal and maximal migration respectively. The benchmark dataset and source code are available at: https://huggingface.co/collections/AmazonScience and https://github.com/amazon-science/self_debug respectively.

Community

Paper author Paper submitter

We introduce 🤗 MigrationBench dataset, a benchmark dataset tailored for repository-level code migration, specifically targeting java 8 to 17 or other long-term support versions.

1. Dataset

MigrationBench comprises a large-scale collection of GitHub repositories, organized into three subsets:

  1. 🤗 AmazonScience/migration-bench-java-full contains 5,102 repos
    • Each repo has a test directory or at least one test case
  2. 🤗 AmazonScience/migration-bench-java-selected with 300 repos
  3. 🤗 AmazonScience/migration-bench-java-utg has 4,814 repos

2. Evaluation Framework

To enable standardized and rigorous evaluation of LLM performance on this complex task, we provide a comprehensive open-source evaluation framework, available at: https://github.com/amazon-science/MigrationBench.

3. Baseline: Code Migration with LLMs

Inspired by Teaching Large Language Models to Self-Debug, we introduce SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration from java 8 to 17.

On the selected subset using Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.33% success rate (pass@1) for minimal and maximal migration respectively.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.09569 in a model README.md to link it from this page.

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.09569 in a Space README.md to link it from this page.

Collections including this paper 1