---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
  - reasoning
  - reasoning-datasets-competition
datasets:
  - davanstrien/natural-reasoning-classifier
language:
  - en
metrics:
  - mse
  - mae
  - spearman
widget:
  - text: >-
      The debate on artificial intelligence's role in society has become
      increasingly polarized. Some argue that AI will lead to widespread
      unemployment and concentration of power, while others contend it will create
      new jobs and democratize access to knowledge. These viewpoints reflect
      different assumptions about technological development, economic systems, and
      human adaptability.
---

# ModernBERT Reasoning Complexity Regressor

<img src="https://cdn-uploads.huggingface.co/production/uploads/60107b385ac3e86b3ea4fc34/vqCMlr4g95ysSAZ2eAn7D.png" alt="ModernBERT-based Reasoning Complexity Regressor" width=500px>

## Model Description

This model predicts the reasoning complexity level (0-4) that a given web text suggests. It's fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [davanstrien/natural-reasoning-classifier](https://huggingface.co/datasets/davanstrien/natural-reasoning-classifier) dataset. The intended use for the model is in a pipeline to try and identify text that may be useful for generating reasoning data.

### Reasoning Complexity Scale

The reasoning complexity scale ranges from:

- **0: Minimal Reasoning** - Simple factual content requiring only recall
- **1: Basic Reasoning** - Straightforward connections or single-step logical processes
- **2: Intermediate Reasoning** - Integration of multiple factors or perspectives
- **3: Advanced Reasoning** - Sophisticated analysis across multiple dimensions
- **4: Expert Reasoning** - Theoretical frameworks and novel conceptual synthesis

## Performance

The model achieves the following results on the evaluation set:

- MSE: 0.2034
- MAE: 0.2578
- Spearman Correlation: 0.6963

## Intended Uses

This model can be used to:

- Filter and classify educational content by reasoning complexity
- Identify complex reasoning problems across diverse domains
- Serve as a first-stage filter in a reasoning dataset creation pipeline

## Limitations

- Predictions are influenced by the original dataset's domain distribution
- Reasoning complexity is subjective and context-dependent

## Training

The model was fine-tuned using a regression objective with the following settings:

- Learning rate: 5e-05
- Batch size: 16
- Optimizer: AdamW
- Schedule: Linear
- Epochs: 10

## Usage Examples

### Using the pipeline API

```python
from transformers import pipeline
pipe = pipeline("text-classification", model="davanstrien/ModernBERT-based-Reasoning-Required")

def predict_reasoning_level(text, pipe):
    # Get the raw prediction
    result = pipe(text)
    score = result[0]['score']

    # Round to nearest integer (optional)
    rounded_score = round(score)

    # Clip to valid range (0-4)
    rounded_score = max(0, min(4, rounded_score))

    # Create a human-readable interpretation (optional)
    reasoning_labels = {
        0: "No reasoning",
        1: "Basic reasoning",
        2: "Moderate reasoning",
        3: "Strong reasoning",
        4: "Advanced reasoning"
    }

    return {
        "raw_score": score,
        "reasoning_level": rounded_score,
        "interpretation": reasoning_labels[rounded_score]
    }

# Usage
text = "This argument uses multiple sources and evaluates competing perspectives before reaching a conclusion."
result = predict_reasoning_level(text, pipe)
print(f"Raw score: {result['raw_score']:.2f}")
print(f"Reasoning level: {result['reasoning_level']}")
print(f"Interpretation: {result['interpretation']}")
```

### Using the model directly

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "davanstrien/modernbert-reasoning-complexity"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare text
text = "The debate on artificial intelligence's role in society has become increasingly polarized."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)

# Get regression score
complexity_score = outputs.logits.item()
print(f"Reasoning Complexity: {complexity_score:.2f}/4.00")
```