linyongver's picture
Update README.md
3c86889 verified
metadata
license: mit

This is the formalizer for tranlating infromal math problem into formal statement in Lean 4. We use the data pair in Lean workbook to train a Qwen-2.5-32B-Coder.

Here is an example code to use this formalizer,

import re
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
import os
import json

def statement_translation_inference(informal_statement):
    return F"""
I want you to translate a informal statment to formal statement in  lean 4, the informal statement of the problem is:

{informal_statement}

The output is
"""

model_name = "Goedel-LM/Goedel-Formalizer-32B-LeanWorkbookAnnotated"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = LLM(model=model_name, seed=1, trust_remote_code=True, swap_space=8, tensor_parallel_size=2, max_model_len=4096)


sampling_params = SamplingParams(
    temperature=1.0,
    max_tokens=2048,
    top_p=0.95,
    n=args.n,
)

data_list = [{
  "informal_statement": "Consider the terms of an arithmetic sequence: $-\frac{1}{3}, y+2, 4y, \ldots$. Solve for $y$."
  }]

model_inputs = [statement_translation_inference(idata["informal_statement"], idata["informal_proof"]) for idata in data_list]

model_outputs = model.generate(
    model_inputs,
    sampling_params,
    use_tqdm=True,
)

Citation

@misc{lin2025goedelproverfrontiermodelopensource,
      title={Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving}, 
      author={Yong Lin and Shange Tang and Bohan Lyu and Jiayun Wu and Hongzhou Lin and Kaiyu Yang and Jia Li and Mengzhou Xia and Danqi Chen and Sanjeev Arora and Chi Jin},
      year={2025},
      eprint={2502.07640},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.07640}, 
}