Spaces:

nlpetprolific
/

user-experience-leaderboard

Running

App Files Files Community

Nora Petrova commited on 27 days ago

Commit

20e666e

1 Parent(s): 99c7281

Add project to new space

Browse files

Files changed (27) hide show

Dockerfile +19 -0
README.md +8 -6
leaderboard-app/.gitignore +41 -0
leaderboard-app/README.md +113 -0
leaderboard-app/app/favicon.ico +0 -0
leaderboard-app/app/globals.css +29 -0
leaderboard-app/app/layout.js +19 -0
leaderboard-app/app/page.js +84 -0
leaderboard-app/components/About.jsx +741 -0
leaderboard-app/components/DemographicAnalysis.jsx +925 -0
leaderboard-app/components/LLMComparisonDashboard.jsx +639 -0
leaderboard-app/components/MetricsBreakdown.jsx +447 -0
leaderboard-app/components/TaskPerformance.jsx +756 -0
leaderboard-app/components/Tooltip.jsx +145 -0
leaderboard-app/eslint.config.mjs +14 -0
leaderboard-app/jsconfig.json +7 -0
leaderboard-app/lib/utils.js +708 -0
leaderboard-app/next.config.mjs +4 -0
leaderboard-app/package-lock.json +0 -0
leaderboard-app/package.json +25 -0
leaderboard-app/postcss.config.mjs +5 -0
leaderboard-app/public/file.svg +1 -0
leaderboard-app/public/globe.svg +1 -0
leaderboard-app/public/leaderboard_data.json +0 -0
leaderboard-app/public/next.svg +1 -0
leaderboard-app/public/vercel.svg +1 -0
leaderboard-app/public/window.svg +1 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+FROM node:20.11.0-slim
+WORKDIR /app
+# Copy the rest of the application code
+COPY --chown=user leaderboard-app/ ./
+RUN npm install
+# Build the app
+RUN npm run build
+# Expose the port the app will run on
+# HF Spaces uses port 7860 by default
+EXPOSE 7860
+# Start the app with the correct port
+ENV PORT=7860
+CMD ["npm", "start"]

README.md CHANGED Viewed

@@ -1,11 +1,13 @@
 ---
-title: User Experience Leaderboard
-emoji: 📚
-colorFrom: indigo
-colorTo: blue
 sdk: docker
-pinned: false
-short_description: User Experience of LLMs Leaderoard
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: UX Leaderboard
+emoji: 🥇
+colorFrom: blue
+colorTo: cyan
 sdk: docker
+pinned: true
+short_description: Leaderboard of LLMs based on detailed human feedback
+tags:
+ - leaderboard
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

leaderboard-app/.gitignore ADDED Viewed

	@@ -0,0 +1,41 @@

+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+# dependencies
+/node_modules
+/.pnp
+.pnp.*
+.yarn/*
+!.yarn/patches
+!.yarn/plugins
+!.yarn/releases
+!.yarn/versions
+# testing
+/coverage
+# next.js
+/.next/
+/out/
+# production
+/build
+# misc
+.DS_Store
+*.pem
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+.pnpm-debug.log*
+# env files (can opt-in for committing if needed)
+.env*
+# vercel
+.vercel
+# typescript
+*.tsbuildinfo
+next-env.d.ts

leaderboard-app/README.md ADDED Viewed

	@@ -0,0 +1,113 @@

+# LLM Comparison Leaderboard
+An interactive dashboard for comparing the performance of state-of-the-art large language models across various tasks and metrics.
+## Features
+- Overall model rankings with comprehensive scoring
+- Task-specific performance analysis
+- Metric breakdowns across different dimensions
+- User satisfaction and experience metrics
+- Interactive visualizations using Recharts
+- Responsive design for all device sizes
+## Getting Started
+### Prerequisites
+- Node.js 16.8 or later
+- Python 3.8 or later (for data processing)
+- Python packages: pandas, numpy
+### Installation
+1. Clone the repository:
+```bash
+git clone https://github.com/yourusername/llm-comparison-leaderboard.git
+cd llm-comparison-leaderboard
+```
+2. Install dependencies:
+```bash
+npm install
+```
+3. Install Python dependencies (if you plan to process data):
+```bash
+pip install pandas numpy
+```
+### Using Sample Data
+The repository includes a sample JSON file with placeholder data in `public/llm_comparison_data.json`. You can start the development server right away to see the dashboard with this data:
+```bash
+npm run dev
+```
+Visit [http://localhost:3000](http://localhost:3000) to see the dashboard.
+### Processing Your Own Data
+If you have your own data, follow these steps:
+1. Place your CSV data file in the `data` directory:
+```bash
+mkdir -p data
+cp /path/to/your/pilot_data_n20.csv data/
+```
+2. Run the data processing script:
+```bash
+npm run process-data
+```
+This will:
+- Process the CSV data using the Python script
+- Generate a JSON file in the `public` directory
+- Format the data for the dashboard
+3. Start the development server:
+```bash
+npm run dev
+```
+## Project Structure
+- `app/` - Next.js App Router components
+  - `page.js` - Main page component that loads data and renders dashboard
+  - `layout.js` - Layout component with metadata and global styles
+  - `globals.css` - Global styles including Tailwind CSS
+- `components/` - React components
+  - `LLMComparisonDashboard.jsx` - The main dashboard component
+- `public/` - Static files
+  - `llm_comparison_data.json` - Processed data for the dashboard
+- `lib/` - Utility functions
+  - `utils.js` - Helper functions for data processing
+- `scripts/` - Data processing scripts
+  - `process_data.js` - Node.js script for running Python processor
+  - `process_data.py` - Python script for data processing
+## Building for Production
+To build the application for production:
+```bash
+npm run build
+```
+To start the production server:
+```bash
+npm run start
+```
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

leaderboard-app/app/favicon.ico ADDED Viewed

leaderboard-app/app/globals.css ADDED Viewed

	@@ -0,0 +1,29 @@

+@import "tailwindcss";
+:root {
+  --background: #ffffff;
+  --foreground: #171717;
+}
+@theme inline {
+  --color-background: var(--background);
+  --color-foreground: var(--foreground);
+  --font-sans: var(--font-geist-sans);
+  --font-mono: var(--font-geist-mono);
+}
+/* Force light theme regardless of color scheme preference */
+/* Disable dark mode
+@media (prefers-color-scheme: dark) {
+  :root {
+    --background: #0a0a0a;
+    --foreground: #ededed;
+  }
+}
+*/
+body {
+  background: var(--background);
+  color: var(--foreground);
+  font-family: Arial, Helvetica, sans-serif;
+}

leaderboard-app/app/layout.js ADDED Viewed

	@@ -0,0 +1,19 @@

+import { Inter } from 'next/font/google';
+import './globals.css';
+const inter = Inter({ subsets: ['latin'] });
+export const metadata = {
+  title: 'LLM Comparison Leaderboard',
+  description: 'Interactive leaderboard comparing performance of state-of-the-art large language models across various tasks and metrics.',
+};
+export default function RootLayout({ children }) {
+  return (
+    <html lang="en">
+      <body className={`${inter.className} bg-gray-50`}>
+        {children}
+      </body>
+    </html>
+  );
+}

leaderboard-app/app/page.js ADDED Viewed

	@@ -0,0 +1,84 @@

+'use client';
+import { useState, useEffect } from 'react';
+import dynamic from 'next/dynamic';
+import { prepareDataForVisualization } from '../lib/utils';
+// Dynamically import the dashboard component with SSR disabled
+// This is important because recharts needs to be rendered on the client side
+const LLMComparisonDashboard = dynamic(
+  () => import('../components/LLMComparisonDashboard'),
+  { ssr: false }
+);
+export default function Home() {
+  const [data, setData] = useState(null);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState(null);
+  useEffect(() => {
+    async function fetchData() {
+      try {
+        setLoading(true);
+        // Fetch the data from the JSON file in the public directory
+        const response = await fetch('/leaderboard_data.json');
+        if (!response.ok) {
+          throw new Error(`Failed to fetch data: ${response.status} ${response.statusText}`);
+        }
+        const jsonData = await response.json();
+        // Process the data for visualization
+        const processedData = prepareDataForVisualization(jsonData);
+        setData(processedData);
+        setLoading(false);
+      } catch (err) {
+        console.error('Error loading data:', err);
+        setError(err.message || 'Failed to load data');
+        setLoading(false);
+      }
+    }
+    fetchData();
+  }, []);
+  if (loading) {
+    return (
+      <div className="flex items-center justify-center min-h-screen">
+        <div className="text-center">
+          <div className="animate-spin rounded-full h-12 w-12 border-b-2 border-blue-500 mx-auto mb-4"></div>
+          <p className="text-lg text-gray-600">Loading LLM comparison data...</p>
+        </div>
+      </div>
+    );
+  }
+  if (error) {
+    return (
+      <div className="flex items-center justify-center min-h-screen">
+        <div className="text-center max-w-md p-6 bg-red-50 rounded-lg border border-red-200">
+          <svg xmlns="http://www.w3.org/2000/svg" className="h-12 w-12 text-red-500 mx-auto mb-4" fill="none" viewBox="0 0 24 24" stroke="currentColor">
+            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
+          </svg>
+          <h2 className="text-xl font-bold text-red-700 mb-2">Error Loading Data</h2>
+          <p className="text-gray-600">{error}</p>
+          <button
+            onClick={() => window.location.reload()}
+            className="mt-4 px-4 py-2 bg-blue-500 text-white rounded hover:bg-blue-600 transition-colors"
+          >
+            Try Again
+          </button>
+        </div>
+      </div>
+    );
+  }
+  return (
+    <main className="min-h-screen p-4">
+      {data && <LLMComparisonDashboard data={data} />}
+    </main>
+  );
+}

leaderboard-app/components/About.jsx ADDED Viewed

	@@ -0,0 +1,741 @@

+"use client";
+import React, { useState } from "react";
+import {
+  ChevronDown,
+  ChevronUp,
+  Info,
+  Book,
+  Calculator,
+  BarChart,
+  UserCheck,
+  CheckCircle,
+  MessageCircle,
+  Brain,
+  SlidersHorizontal,
+  Shield,
+  Smile,
+  Globe,
+} from "lucide-react";
+const AboutTab = () => {
+  // Task list for easier management
+  const tasksUsed = [
+    "Following Up on Job Application: Drafting a professional follow-up email",
+    "Planning Weekly Meals: Creating a meal plan accommodating dietary restrictions",
+    "Creating Travel Itinerary: Planning a European city break",
+    "Understanding Complex Topic: Learning about day trading concepts",
+    "Generating Creative Ideas: Brainstorming unique birthday gift ideas",
+    "Making Decisions Between Options: Comparing tech products for purchase",
+  ];
+  // State for collapsible sections
+  const [openSections, setOpenSections] = useState({
+    introduction: true,
+    methodology: true,
+    metricsCalculation: true,
+    metricsExplained: true,
+  });
+  // State for active metric tab
+  const [activeMetricTab, setActiveMetricTab] = useState("helpfulness");
+  // Toggle section visibility
+  const toggleSection = (section) => {
+    setOpenSections({
+      ...openSections,
+      [section]: !openSections[section],
+    });
+  };
+  // Metrics data
+  const metricsData = [
+    {
+      id: "helpfulness",
+      title: "Helpfulness",
+      icon: <CheckCircle size={18} />,
+      color: "bg-green-500",
+      description:
+        "Evaluates how well the model provides useful, practical assistance that addresses the user's needs and helps them accomplish their goals.",
+      metrics: [
+        {
+          name: "Effectiveness",
+          description:
+            "How effectively did the model help you accomplish your specific goal?",
+        },
+        {
+          name: "Comprehensiveness",
+          description:
+            "How comprehensive was the model's response in addressing all aspects of your request?",
+        },
+        {
+          name: "Usefulness",
+          description:
+            "How useful were the model's suggestions or solutions for your needs?",
+        },
+      ],
+    },
+    {
+      id: "communication",
+      title: "Communication",
+      icon: <MessageCircle size={18} />,
+      color: "bg-blue-500",
+      description:
+        "Assesses the clarity, coherence, and appropriateness of the model's writing style, including tone and language choices.",
+      metrics: [
+        {
+          name: "Tone and Language Style",
+          description:
+            "How well did the model match its tone and language style to the context of your interaction?",
+        },
+        {
+          name: "Conversation Flow",
+          description:
+            "How natural and conversational were the model's responses?",
+        },
+        {
+          name: "Detail and Technical Language",
+          description:
+            "How appropriate was the level of detail and technical language for your needs?",
+        },
+      ],
+    },
+    {
+      id: "understanding",
+      title: "Understanding",
+      icon: <Brain size={18} />,
+      color: "bg-purple-500",
+      description:
+        "Measures how well the model comprehends the user's requests, including implicit needs and contextual information.",
+      metrics: [
+        {
+          name: "Accuracy",
+          description:
+            "How accurately did the model interpret your initial request?",
+        },
+        {
+          name: "Context Memory",
+          description:
+            "How well did the model maintain context throughout the conversation?",
+        },
+        {
+          name: "Intuitiveness",
+          description:
+            "How well did the model pick up on implicit aspects of your request without requiring explicit explanation?",
+        },
+      ],
+    },
+    {
+      id: "adaptiveness",
+      title: "Adaptiveness",
+      icon: <SlidersHorizontal size={18} />,
+      color: "bg-amber-500",
+      description:
+        "Measures how well the model adjusts to different user needs, contexts, and feedback throughout a conversation.",
+      metrics: [
+        {
+          name: "Flexibility",
+          description:
+            "How effectively did the model adjust its responses based on your feedback?",
+        },
+        {
+          name: "Clarity",
+          description:
+            "How well did the model clarify ambiguities or misunderstandings?",
+        },
+        {
+          name: "Conversation Building",
+          description:
+            "How well did the model build upon previous exchanges in the conversation?",
+        },
+      ],
+    },
+    {
+      id: "trustworthiness",
+      title: "Trustworthiness",
+      icon: <Shield size={18} />,
+      color: "bg-red-500",
+      description:
+        "Evaluates transparency, citations, acknowledgment of limitations, and overall user confidence in the model's responses.",
+      metrics: [
+        {
+          name: "Consistency",
+          description:
+            "How consistent were the model's responses across similar questions?",
+        },
+        {
+          name: "Confidence",
+          description:
+            "How confident were you in the accuracy of the model's information?",
+        },
+        {
+          name: "Transparency",
+          description:
+            "How transparent was the model about its limitations or uncertainties?",
+        },
+      ],
+    },
+    {
+      id: "personality",
+      title: "Personality",
+      icon: <Smile size={18} />,
+      color: "bg-pink-500",
+      description:
+        "Assesses consistency and definition of the model's persona, and alignment with expectations of honesty, empathy, and fairness.",
+      metrics: [
+        {
+          name: "Personality Consistency",
+          description: "How consistent was the LLM's personality?",
+        },
+        {
+          name: "Distinct Personality",
+          description: "How well-defined was the LLM's personality?",
+        },
+        {
+          name: "Honesty Empathy Fairness",
+          description:
+            "How much did the LLM respond in a way that aligned with your expectations of honesty, empathy, or fairness?",
+        },
+      ],
+    },
+    {
+      id: "background",
+      title: "Background and Culture",
+      icon: <Globe size={18} />,
+      color: "bg-teal-500",
+      description:
+        "Evaluates cultural sensitivity, alignment, relevance, and freedom from bias.",
+      metrics: [
+        {
+          name: "Ethical Alignment",
+          description:
+            "How aligned with your culture, viewpoint, or values was the LLM?",
+        },
+        {
+          name: "Cultural Awareness",
+          description:
+            "How well did the LLM recognize when your cultural perspective was relevant?",
+        },
+        {
+          name: "Bias and Stereotypes",
+          description:
+            "How free from stereotypes or bias was the LLM's response?",
+        },
+      ],
+    },
+  ];
+  // Section header component
+  const SectionHeader = ({ title, icon, section }) => (
+    <div
+      className="px-4 py-3 bg-gray-50 border-b flex justify-between items-center cursor-pointer"
+      onClick={() => toggleSection(section)}
+    >
+      <div className="flex items-center gap-2">
+        {icon}
+        <h3 className="font-semibold text-gray-800">{title}</h3>
+      </div>
+      {openSections[section] ? (
+        <ChevronUp size={16} />
+      ) : (
+        <ChevronDown size={16} />
+      )}
+    </div>
+  );
+  return (
+    <div className="space-y-6">
+      {/* Introduction */}
+      <div className="border rounded-lg overflow-hidden shadow-sm">
+        <SectionHeader
+          title="About HUMAINE"
+          icon={<Info size={18} />}
+          section="introduction"
+        />
+        {openSections.introduction && (
+          <div className="p-4 bg-gradient-to-r from-white to-blue-50">
+            <div className="flex flex-col md:flex-row gap-6">
+              <div className="md:w-2/3">
+                <p className="mb-4">
+                  <strong>HUMAINE</strong> (Human Understanding and Measurement
+                  of AI Natural Engagement) is an evaluation benchmark that
+                  measures language model performance through actual user
+                  experience. While many benchmarks focus on technical
+                  capabilities, this evaluation captures how users perceive and
+                  rate different LLMs across common, everyday use cases.
+                </p>
+                <p className="mb-4">
+                  This study collected ratings from 514 participants
+                  demographically representative of the US population. Each
+                  participant completed real-world tasks with different LLMs and
+                  provided structured feedback on various aspects of their
+                  experience.
+                </p>
+                <p>
+                  The evaluation framework includes 7 high-level categories and
+                  21 specific low-level metrics that measure aspects like
+                  helpfulness, communication quality, understanding,
+                  adaptiveness, trustworthiness, personality, and cultural
+                  awareness, alongside demographic equity analysis.
+                </p>
+              </div>
+              <div className="md:w-1/3 bg-white p-4 rounded-lg border shadow-sm">
+                <h4 className="font-medium text-gray-700 mb-2 border-b pb-1">
+                  Tasks Evaluated
+                </h4>
+                <ul className="list-disc pl-5 space-y-2 text-sm">
+                  {tasksUsed.map((task, index) => (
+                    <li key={index} className="text-gray-700">
+                      {task}
+                    </li>
+                  ))}
+                </ul>
+              </div>
+            </div>
+          </div>
+        )}
+      </div>
+      {/* Methodology */}
+      <div className="border rounded-lg overflow-hidden shadow-sm">
+        <SectionHeader
+          title="Methodology"
+          icon={<Book size={18} />}
+          section="methodology"
+        />
+        {openSections.methodology && (
+          <div className="p-4">
+            <div className="grid md:grid-cols-1 gap-4">
+              {/* Study Design */}
+              <div className="border rounded-lg p-4 bg-gray-50 hover:shadow-md transition-shadow">
+                <h4 className="text-lg font-medium mb-2 flex items-center gap-2 text-gray-800">
+                  <span className="w-8 h-8 rounded-full bg-blue-500 flex items-center justify-center text-white">
+                    1
+                  </span>
+                  Study Design
+                </h4>
+                <ul className="list-disc pl-5 space-y-1 text-sm">
+                  <li>
+                    <strong>Participants:</strong> 514 individuals representing
+                    US demographics (stratified by age, sex, ethnicity,
+                    political affiliation).
+                  </li>
+                  <li>
+                    <strong>Task Design:</strong> Six everyday tasks spanning
+                    creative, practical, and analytical use cases.
+                  </li>
+                  <li>
+                    <strong>Process:</strong> Each participant completed all six
+                    tasks, each with a different LLM. The assignment of tasks to
+                    models and the order of tasks were fully randomized.
+                  </li>
+                  <li>
+                    <strong>Models Evaluated:</strong> Latest o1, GPT-4o, Claude
+                    3.7 (extended thinking), Gemini 2 Flash, LLama 3.1 405B,
+                    Deepseek R1.
+                  </li>
+                  <li>
+                    <strong>Model Access:</strong> All models were accessed via
+                    openrouter.ai with temperature=1, min_tokens=50,
+                    max_tokens=5,000.
+                  </li>
+                  <li>
+                    <strong>Conversations:</strong> Participants were required
+                    to exchange at least 4 messages with the models and they
+                    could exchange more if they wished (not capped).
+                  </li>
+                </ul>
+              </div>
+              {/* Evaluation Framework */}
+              <div className="border rounded-lg p-4 bg-gray-50 hover:shadow-md transition-shadow">
+                <h4 className="text-lg font-medium mb-2 flex items-center gap-2 text-gray-800">
+                  <span className="w-8 h-8 rounded-full bg-blue-500 flex items-center justify-center text-white">
+                    2
+                  </span>
+                  Evaluation Framework
+                </h4>
+                <p className="mb-2 text-sm">
+                  Our approach captures multiple aspects of user experience:
+                </p>
+                <ul className="list-disc pl-5 space-y-1 text-sm">
+                  <li>
+                    <strong>Multi-Dimensional Metrics:</strong> Performance is
+                    evaluated across 7 high-level categories (rated 1-7) and 21
+                    specific low-level metrics (rated 1-5).
+                  </li>
+                  <li>
+                    <strong>Demographic Analysis:</strong> We assess performance
+                    consistency across different demographic groups through
+                    equity assessment.
+                  </li>
+                  <li>
+                    <strong>Scale Normalization:</strong> All ratings are
+                    converted to a 0-100 scale for easier comparison.
+                  </li>
+                </ul>
+              </div>
+              {/* Data Analysis & Weighting */}
+              <div className="border rounded-lg p-4 bg-gray-50 hover:shadow-md transition-shadow">
+                <h4 className="text-lg font-medium mb-2 flex items-center gap-2 text-gray-800">
+                  <span className="w-8 h-8 rounded-full bg-blue-500 flex items-center justify-center text-white">
+                    3
+                  </span>
+                  Data Analysis & Weighting
+                </h4>
+                <ul className="list-disc pl-5 space-y-1 text-sm">
+                  <li>
+                    <strong>MRP Methodology:</strong> Data is processed through
+                    multiple regression with poststratification to create
+                    results weighted to be highly representative of the US
+                    population.
+                  </li>
+                  <li>
+                    <strong>Robust Estimation:</strong> All model estimations
+                    were parametrically bootstrapped (N = 1000) to ensure that
+                    any uncertainty in the estimates was accounted for.
+                  </li>
+                  <li>
+                    <strong>National Level Comparisons:</strong> For the Overall
+                    Rankings and Metrics Breakdown tabs, we use the
+                    national-level estimates derived from MRP.
+                  </li>
+                  <li>
+                    <strong>Task-Level Comparisons:</strong> For task-specific
+                    comparisons (Task Performance tab), we use the raw
+                    (unweighted) data due to sample size constraints.
+                  </li>
+                </ul>
+              </div>
+              {/* Demographic Equity Assessment */}
+              <div className="border rounded-lg p-4 bg-gray-50 hover:shadow-md transition-shadow">
+                <h4 className="text-lg font-medium mb-2 flex items-center gap-2 text-gray-800">
+                  <span className="w-8 h-8 rounded-full bg-blue-500 flex items-center justify-center text-white">
+                    4
+                  </span>
+                  Demographic Equity Assessment
+                </h4>
+                <p className="mb-2 text-sm">
+                  The equity assessment evaluates performance consistency across
+                  demographic groups using a standardized approach:
+                </p>
+                <div className="bg-white rounded p-3 border mb-2">
+                  <p className="text-xs mb-2">
+                    The <strong>Equity Gap</strong> is the score difference
+                    between the highest and lowest scoring demographic groups
+                    for a specific metric. For example, if a model scores 85
+                    with users age 18-29 but 65 with users age 60+ on
+                    helpfulness, the equity gap would be 20 points.
+                  </p>
+                  <p className="text-xs mb-2">
+                    We evaluate equity gaps using both{" "}
+                    <strong>Effect Size</strong> and{" "}
+                    <strong>Statistical Significance</strong> to identify
+                    meaningful performance differences:
+                  </p>
+                  <div className="text-xs mt-2 space-y-2">
+                    <div>
+                      <p className="font-medium text-gray-700">
+                        Effect Size Calculation:
+                      </p>
+                      <p className="text-gray-600 ml-2">
+                        We normalize each gap by dividing it by the category's
+                        standard deviation:
+                        <br />
+                        <span className="font-mono bg-gray-100 px-1">
+                          Effect Size = (Max Score - Min Score) / Category
+                          Standard Deviation
+                        </span>
+                      </p>
+                      <p className="text-gray-600 ml-2 mt-1">
+                        Category Standard Deviation is calculated from all
+                        demographic MRP scores within that specific category.
+                      </p>
+                    </div>
+                    <div>
+                      <p className="font-medium text-gray-700">
+                        Effect Size Classification:
+                      </p>
+                      <div className="grid grid-cols-2 gap-x-3 gap-y-2 mt-1">
+                        <div className="flex items-center gap-1">
+                          <div className="w-3 h-3 rounded-full bg-red-100"></div>
+                          <div>
+                            <span className="font-medium text-gray-700">
+                              Large
+                            </span>
+                            <p className="text-gray-500">Effect Size ≥ 0.8</p>
+                          </div>
+                        </div>
+                        <div className="flex items-center gap-1">
+                          <div className="w-3 h-3 rounded-full bg-yellow-100"></div>
+                          <div>
+                            <span className="font-medium text-gray-700">
+                              Medium
+                            </span>
+                            <p className="text-gray-500">Effect Size 0.5-0.8</p>
+                          </div>
+                        </div>
+                        <div className="flex items-center gap-1">
+                          <div className="w-3 h-3 rounded-full bg-blue-100"></div>
+                          <div>
+                            <span className="font-medium text-gray-700">
+                              Small
+                            </span>
+                            <p className="text-gray-500">Effect Size 0.2-0.5</p>
+                          </div>
+                        </div>
+                        <div className="flex items-center gap-1">
+                          <div className="w-3 h-3 rounded-full bg-green-100"></div>
+                          <div>
+                            <span className="font-medium text-gray-700">
+                              Negligible
+                            </span>
+                            <p className="text-gray-500">
+                              Effect Size &lt; 0.2
+                            </p>
+                          </div>
+                        </div>
+                      </div>
+                    </div>
+                    <div>
+                      <p className="font-medium text-gray-700">
+                        Statistical Significance:
+                      </p>
+                      <p className="text-gray-600 ml-2">
+                        We use p-values to determine if gaps are statistically
+                        significant (p &lt; 0.05). To account for the large
+                        number of tests performed, p-values were adjusted using
+                        the Benjamini-Hochberg (FDR) method. Significance
+                        reported reflects this correction (q &lt; 0.05).
+                      </p>
+                    </div>
+                    <div>
+                      <p className="font-medium text-gray-700">
+                        Equity Concerns:
+                      </p>
+                      <p className="text-gray-600 ml-2">
+                        A gap is flagged as an equity concern when it has both:
+                        <br />
+                        1. Large Effect Size (≥ 0.8)
+                        <br />
+                        2. Statistical Significance (p &lt; 0.05)
+                      </p>
+                    </div>
+                  </div>
+                  <p className="text-xs text-gray-600 mt-2">
+                    <strong>Note:</strong> This methodology allows us to
+                    identify meaningful performance differences across
+                    demographic groups while accounting for both the magnitude
+                    of the gap (effect size) and its statistical reliability
+                    (significance).
+                  </p>
+                </div>
+              </div>
+            </div>
+          </div>
+        )}
+      </div>
+      {/* Metrics Calculation */}
+      <div className="border rounded-lg overflow-hidden shadow-sm">
+        <SectionHeader
+          title="Metrics Calculation"
+          icon={<Calculator size={18} />}
+          section="metricsCalculation"
+        />
+        {openSections.metricsCalculation && (
+          <div className="p-4">
+            <p className="text-sm mb-4">
+              This section explains how the metrics in the Overview page's
+              ranking table are calculated.
+            </p>
+            <div className="grid md:grid-cols-2 lg:grid-cols-3 gap-3">
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  Overall Score
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Average score across high-level categories at the national
+                  level (0-100). This represents overall model performance
+                  across all evaluation dimensions.
+                </p>
+              </div>
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  Overall SD
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Standard Deviation across high-level categories (lower = more
+                  consistent). Measures how consistent a model performs across
+                  different capability areas.
+                </p>
+              </div>
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  Max Equity Gap
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Largest demographic score difference (hover for details).
+                  Shows the maximum difference in scores between any two
+                  demographic groups, with indicators for effect size and
+                  statistical significance.
+                </p>
+              </div>
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  Max Gap Area
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Factor and Category where the Max Equity Gap occurs.
+                  Identifies which demographic factor (e.g., Age, Gender) and
+                  which category (e.g., Helpfulness, Understanding) shows the
+                  largest performance difference.
+                </p>
+              </div>
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  Equity Concerns
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Percentage of demographic gaps flagged as equity concerns
+                  (lower is better). An equity concern is defined as a gap with
+                  both large effect size (≥0.8) and statistical significance.
+                </p>
+              </div>
+              <div className="border rounded p-3 hover:shadow-md transition-shadow">
+                <h4 className="text-sm font-medium text-gray-800 mb-1 flex items-center gap-1">
+                  <div className="w-4 h-4 rounded-full bg-blue-500"></div>
+                  User Retention
+                </h4>
+                <p className="text-xs text-gray-600">
+                  Percentage of participants who said they would use the model
+                  again. This is based on the "Repeat Usage" question and
+                  indicates user satisfaction and likelihood to continue using
+                  the model.
+                </p>
+              </div>
+            </div>
+            <div className="mt-4 bg-blue-50 border-l-4 border-blue-400 p-3 rounded">
+              <p className="text-xs text-blue-800">
+                <strong>Note:</strong> All scores shown in the dashboard are
+                based on MRP-adjusted (Multilevel Regression with
+                Poststratification) estimates to ensure they are representative
+                of the US population. The only exception is the Task Performance
+                tab, which uses raw scores due to sample size constraints at the
+                task level.
+              </p>
+            </div>
+          </div>
+        )}
+      </div>
+      {/* Metrics Explained */}
+      <div className="border rounded-lg overflow-hidden shadow-sm">
+        <SectionHeader
+          title="Metrics Explained"
+          icon={<BarChart size={18} />}
+          section="metricsExplained"
+        />
+        {openSections.metricsExplained && (
+          <div className="p-4">
+            <p className="mb-4 text-sm">
+              Our evaluation uses 7 high-level categories (rated on a 1-7 Likert
+              scale) and 21 low-level metrics (rated on a 1-5 scale) to
+              comprehensively assess LLM performance from a user experience
+              perspective.
+            </p>
+            {/* Metric selector tabs */}
+            <div className="flex flex-wrap gap-1 mb-4 border-b">
+              {metricsData.map((metric) => (
+                <button
+                  key={metric.id}
+                  className={`px-3 py-2 text-sm rounded-t-lg flex items-center gap-1 ${
+                    activeMetricTab === metric.id
+                      ? "bg-gray-100 font-medium border-t border-l border-r"
+                      : "bg-white hover:bg-gray-50"
+                  }`}
+                  onClick={() => setActiveMetricTab(metric.id)}
+                >
+                  <span
+                    className={`w-2 h-2 rounded-full ${metric.color}`}
+                  ></span>
+                  {metric.title}
+                </button>
+              ))}
+            </div>
+            {/* Active metric content */}
+            {metricsData.map(
+              (metric) =>
+                activeMetricTab === metric.id && (
+                  <div
+                    key={metric.id}
+                    className="border rounded-lg overflow-hidden"
+                  >
+                    <div className="px-4 py-3 bg-gray-50 border-b flex items-center gap-2">
+                      <div className={`rounded-full`}>
+                        {React.cloneElement(metric.icon, {
+                          className: `text-gray-700 w-5 h-5`,
+                        })}
+                      </div>
+                      <h4 className="font-medium text-gray-800">
+                        {metric.title}{" "}
+                        <span className="text-sm font-normal text-gray-600">
+                          (1-7 scale)
+                        </span>
+                      </h4>
+                    </div>
+                    <div className="p-4">
+                      <p className="text-sm mb-4">{metric.description}</p>
+                      {metric.metrics.length > 0 && (
+                        <>
+                          <h5 className="text-sm font-medium mb-3 text-gray-700">
+                            Specific Metrics (1-5 scale)
+                          </h5>
+                          <div className="grid md:grid-cols-3 gap-3">
+                            {metric.metrics.map((subMetric, idx) => (
+                              <div
+                                key={idx}
+                                className="border rounded p-3 hover:shadow-sm transition-shadow"
+                              >
+                                <p className="text-sm font-medium">
+                                  {subMetric.name}
+                                </p>
+                                <p className="text-xs text-gray-600 mt-1">
+                                  {subMetric.description}
+                                </p>
+                              </div>
+                            ))}
+                          </div>
+                        </>
+                      )}
+                    </div>
+                  </div>
+                )
+            )}
+          </div>
+        )}
+      </div>
+    </div>
+  );
+};
+export default AboutTab;

leaderboard-app/components/DemographicAnalysis.jsx ADDED Viewed

	@@ -0,0 +1,925 @@

+// components/DemographicAnalysis.jsx - Complete Updated File
+"use client";
+import React, { useState, useMemo, useEffect, useRef } from "react";
+import {
+  BarChart,
+  Bar,
+  XAxis,
+  YAxis,
+  CartesianGrid,
+  Tooltip as RechartsTooltip,
+  Legend,
+  ResponsiveContainer,
+  Cell,
+  LabelList,
+} from "recharts";
+import {
+  getSignificanceIndicator,
+  formatDisplayKey,
+  getMetricTooltip,
+} from "../lib/utils"; // Adjust path as needed
+import { Tooltip } from "./Tooltip"; // Your custom Tooltip component
+// Helper component for info tooltips with fixed positioning
+const InfoTooltip = ({ text }) => {
+  const [isVisible, setIsVisible] = useState(false);
+  const [position, setPosition] = useState({ top: 0, left: 0 });
+  const buttonRef = useRef(null);
+  // Update position when tooltip becomes visible
+  useEffect(() => {
+    if (isVisible && buttonRef.current) {
+      const rect = buttonRef.current.getBoundingClientRect();
+      setPosition({
+        top: rect.top - 10, // Position above the icon with a small gap
+        left: rect.left + 12, // Center with the icon
+      });
+    }
+  }, [isVisible]);
+  return (
+    <div className="relative inline-block ml-1 align-middle">
+      <button
+        ref={buttonRef}
+        className="text-gray-400 hover:text-gray-600 focus:outline-none"
+        onMouseEnter={() => setIsVisible(true)}
+        onMouseLeave={() => setIsVisible(false)}
+        onClick={(e) => {
+          e.stopPropagation();
+          setIsVisible(!isVisible);
+        }}
+        aria-label="Info"
+      >
+        <svg
+          xmlns="http://www.w3.org/2000/svg"
+          className="h-4 w-4"
+          viewBox="0 0 20 20"
+          fill="currentColor"
+        >
+          <path
+            fillRule="evenodd"
+            d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7-4a1 1 0 11-2 0 1 1 0 012 0zM9 9a1 1 0 000 2v3a1 1 0 001 1h1a1 1 0 100-2v-3a1 1 0 00-1-1H9z"
+            clipRule="evenodd"
+          />
+        </svg>
+      </button>
+      {isVisible && (
+        <div
+          className="fixed p-2 bg-white border-1 rounded shadow-xl text-xs text-gray-700 whitespace-pre-wrap"
+          style={{
+            top: `${position.top}px`,
+            left: `${position.left}px`,
+            zIndex: 9999,
+            maxWidth: "250px",
+            transform: "translate(-50%, -100%)",
+          }}
+        >
+          {text}
+        </div>
+      )}
+    </div>
+  );
+};
+// Custom tooltip for DEMOGRAPHIC chart (shows scores per model for a level)
+const CustomDemographicTooltip = ({ active, payload, label }) => {
+  if (active && payload && payload.length) {
+    const sortedPayload = [...payload].sort(
+      (a, b) => (b.value || 0) - (a.value || 0)
+    );
+    return (
+      <div className="bg-white p-3 border rounded shadow-lg max-w-xs">
+        <p className="font-medium text-sm mb-1">{label}</p>
+        {sortedPayload.map((entry, index) => (
+          <div key={`item-${index}`} className="flex items-center mt-1">
+            <div
+              className="w-3 h-3 mr-2 rounded-full flex-shrink-0"
+              style={{
+                backgroundColor:
+                  entry.payload[`${entry.dataKey}_color`] ||
+                  entry.color ||
+                  "#999",
+              }}
+            ></div>
+            <span className="text-xs flex-grow pr-2">{entry.name}: </span>
+            <span className="text-xs font-medium ml-1 whitespace-nowrap">
+              {typeof entry.value === "number" ? entry.value.toFixed(1) : "N/A"}
+            </span>
+          </div>
+        ))}
+      </div>
+    );
+  }
+  return null;
+};
+// Custom tooltip for EQUITY GAP chart - UPDATED
+const EquityGapTooltip = ({ active, payload }) => {
+  if (active && payload && payload.length > 0) {
+    const data = payload[0].payload; // data here IS an item from equityGapChartData (derived from all_equity_gaps)
+    if (!data || typeof data !== "object") return null;
+    // Get significance indicator parts
+    const significanceInfo = getSignificanceIndicator(
+      data.is_statistically_significant,
+      data.p_value
+    );
+    const ciLower = data.gap_confidence_interval_95_lower;
+    const ciUpper = data.gap_confidence_interval_95_upper;
+    return (
+      <div className="bg-white p-3 border rounded shadow-lg text-xs max-w-xs">
+        <p className="font-medium text-sm mb-2">{data.model}</p>
+        <div className="space-y-1">
+          <div className="flex justify-between">
+            <span className="font-semibold">Equity Gap:</span>
+            {/* 'gap' key is used in chart data */}
+            <span>{data.gap?.toFixed(1) ?? "N/A"} pts</span>
+          </div>
+          {data.effect_size !== undefined && data.effect_size !== null && (
+            <div className="flex justify-between">
+              <span className="font-semibold">Effect Size:</span>
+              <span>
+                {data.effect_size?.toFixed(2) ?? "N/A"} (
+                {data.effect_size_class || "N/A"})
+              </span>
+            </div>
+          )}
+          {/* Show Significance */}
+          <div className="flex justify-between items-center">
+            <span className="font-semibold">Significance:</span>
+            <span className={`flex items-center ${significanceInfo.className}`}>
+              {significanceInfo.tooltip.replace(/Statistically /g, "")}{" "}
+              {/* Shorten text */}
+              <span className="ml-1 font-bold">{significanceInfo.symbol}</span>
+            </span>
+          </div>
+          {/* Show Confidence Interval */}
+          <div className="flex justify-between">
+            <span className="font-semibold">95% CI:</span>
+            <span>
+              {typeof ciLower === "number" && typeof ciUpper === "number"
+                ? `[${ciLower.toFixed(1)}, ${ciUpper.toFixed(1)}]`
+                : "N/A"}
+            </span>
+          </div>
+          {/* Show Concern Flag */}
+          {data.is_equity_concern !== undefined && (
+            <div className="flex justify-between">
+              <span className="font-semibold">Concern Flag:</span>
+              <span
+                className={
+                  data.is_equity_concern
+                    ? "font-bold text-red-600"
+                    : "text-gray-600"
+                }
+              >
+                {data.is_equity_concern ? "Yes" : "No"}
+              </span>
+            </div>
+          )}
+          {/* Show Min/Max Groups */}
+          <div className="flex justify-between">
+            <span className="font-semibold">Lowest Group:</span>
+            <span>
+              {data.min_level || "N/A"} ({data.min_score?.toFixed(1) ?? "-"})
+            </span>
+          </div>
+          <div className="flex justify-between">
+            <span className="font-semibold">Highest Group:</span>
+            <span>
+              {data.max_level || "N/A"} ({data.max_score?.toFixed(1) ?? "-"})
+            </span>
+          </div>
+        </div>
+      </div>
+    );
+  }
+  return null;
+};
+// New helper functions for styling consistency
+// New helper function to get badge color for effect size
+const getEffectSizeBadgeStyle = (effectSizeClass) => {
+  switch (effectSizeClass) {
+    case "Large":
+      return "bg-red-100 text-red-800";
+    case "Medium":
+      return "bg-yellow-100 text-yellow-800";
+    case "Small":
+      return "bg-blue-100 text-blue-800";
+    case "Negligible":
+      return "bg-green-100 text-green-800";
+    default:
+      return "bg-gray-100 text-gray-800";
+  }
+};
+// New helper function to get badge color for significance
+const getSignificanceBadgeStyle = (isSignificant) => {
+  if (isSignificant === null || isSignificant === undefined)
+    return "bg-gray-100 text-gray-800";
+  return isSignificant
+    ? "bg-blue-100 text-blue-800"
+    : "bg-gray-100 text-gray-600";
+};
+// New helper function to get badge color for concern
+const getConcernBadgeStyle = (isConcern) => {
+  if (isConcern === null || isConcern === undefined)
+    return "bg-gray-100 text-gray-800";
+  return isConcern ? "bg-red-100 text-red-800" : "bg-green-100 text-green-800";
+};
+// New helper function to format p-value
+const formatPValue = (pValue) => {
+  if (pValue === null || pValue === undefined) return "N/A";
+  return `p=${pValue.toFixed(3)}` + (pValue < 0.05 ? " < 0.05" : " ≥ 0.05");
+};
+// New helper function to create effect size tooltip content
+const getEffectSizeTooltip = (effectSize) => {
+  return `Effect Size: ${effectSize.toFixed(2)}
+Calculation: Normalized Effect Size = (Max Score - Min Score) / Category Standard Deviation
+Category Standard Deviation: The standard deviation of all demographic scores within this specific category.
+Thresholds:
+• ≥ 0.8: "Large"
+• ≥ 0.5 and < 0.8: "Medium"
+• ≥ 0.2 and < 0.5: "Small"
+• < 0.2: "Negligible"`;
+};
+// Main component
+const DemographicAnalysis = ({
+  rawData = { demographicOptions: {}, mrpDemographics: {} }, // Expect camelCase keys here, snake_case inside mrpDemographics
+  modelsMeta = [], // Expect camelCase keys
+  metricsData = { highLevelCategories: {}, lowLevelMetrics: {} }, // Expect Title Case keys, contains internalMetricKey
+  equityAnalysis = { all_equity_gaps: [], universal_issues: [] }, // Expect snake_case keys
+}) => {
+  // Use Title Case metric keys for state and dropdowns
+  const highLevelMetricDisplayKeys = Object.keys(
+    metricsData?.highLevelCategories || {}
+  ).sort();
+  const lowLevelMetricDisplayKeys = Object.keys(
+    metricsData?.lowLevelMetrics || {}
+  ).sort();
+  const [selectedDemographicFactor, setSelectedDemographicFactor] =
+    useState(null);
+  const [selectedMetricDisplayKey, setSelectedMetricDisplayKey] =
+    useState(null); // State holds Title Case
+  const [metricLevel, setMetricLevel] = useState("high");
+  const currentMetricDisplayKeys = useMemo(
+    () =>
+      metricLevel === "high"
+        ? highLevelMetricDisplayKeys
+        : lowLevelMetricDisplayKeys,
+    [metricLevel, highLevelMetricDisplayKeys, lowLevelMetricDisplayKeys]
+  );
+  const getModelColor = (modelName) =>
+    modelsMeta.find((m) => m.model === modelName)?.color || "#999999";
+  // Set default factor
+  useEffect(() => {
+    const factors = Object.keys(rawData.demographicOptions || {});
+    if (!selectedDemographicFactor && factors.length > 0) {
+      const defaultFactor = factors.includes("Age") ? "Age" : factors.sort()[0];
+      setSelectedDemographicFactor(defaultFactor);
+    }
+  }, [rawData.demographicOptions, selectedDemographicFactor]);
+  // Set default metric when list available
+  useEffect(() => {
+    if (!selectedMetricDisplayKey && currentMetricDisplayKeys.length > 0) {
+      // Default logic might need adjustment if "Overall" isn't a key
+      const defaultMetric = currentMetricDisplayKeys.includes("Overall Score")
+        ? "Overall Score"
+        : currentMetricDisplayKeys[0];
+      setSelectedMetricDisplayKey(defaultMetric);
+    } else if (
+      selectedMetricDisplayKey &&
+      !currentMetricDisplayKeys.includes(selectedMetricDisplayKey)
+    ) {
+      setSelectedMetricDisplayKey(
+        currentMetricDisplayKeys.length > 0 ? currentMetricDisplayKeys[0] : null
+      );
+    }
+  }, [currentMetricDisplayKeys, selectedMetricDisplayKey, metricLevel]);
+  // Get the internal snake_case key for filtering equity gaps
+  const internalMetricKey = useMemo(() => {
+    if (!selectedMetricDisplayKey) return null;
+    const allMetrics = {
+      ...(metricsData?.highLevelCategories || {}),
+      ...(metricsData?.lowLevelMetrics || {}),
+    };
+    // Look up using Title Case display key
+    return allMetrics[selectedMetricDisplayKey]?.internalMetricKey ?? null;
+  }, [selectedMetricDisplayKey, metricsData]);
+  // Filter equity gaps based on internal key and factor
+  const filteredEquityGaps = useMemo(() => {
+    // Use internalMetricKey (snake_case) and selectedDemographicFactor
+    if (
+      !internalMetricKey ||
+      !selectedDemographicFactor ||
+      !equityAnalysis?.all_equity_gaps ||
+      !Array.isArray(equityAnalysis.all_equity_gaps)
+    ) {
+      return [];
+    }
+    // Filter all_equity_gaps (which has snake_case keys)
+    return equityAnalysis.all_equity_gaps.filter(
+      (gap) =>
+        gap.category === internalMetricKey &&
+        gap.demographic_factor === selectedDemographicFactor
+    );
+  }, [
+    internalMetricKey,
+    selectedDemographicFactor,
+    equityAnalysis?.all_equity_gaps,
+  ]);
+  // Prepare data for Equity Gap Chart - uses snake_case keys from filteredEquityGaps
+  const equityGapChartData = useMemo(() => {
+    return filteredEquityGaps
+      .map((gap) => ({
+        // Pass all original snake_case keys needed by tooltip/table
+        // These keys match the fields expected by EquityGapTooltip
+        model: gap.model,
+        gap: gap.score_range ?? 0, // Rename score_range to gap for chart dataKey
+        score_range: gap.score_range,
+        effect_size: gap.effect_size,
+        effect_size_class: gap.effect_size_class,
+        is_statistically_significant: gap.is_statistically_significant,
+        p_value: gap.p_value,
+        gap_confidence_interval_95_lower: gap.gap_confidence_interval_95_lower,
+        gap_confidence_interval_95_upper: gap.gap_confidence_interval_95_upper,
+        is_equity_concern: gap.is_equity_concern,
+        min_level: gap.min_level,
+        min_score: gap.min_score,
+        max_level: gap.max_level,
+        max_score: gap.max_score,
+        // Add derived properties
+        color: getModelColor(gap.model),
+      }))
+      .sort((a, b) => (a.gap ?? 0) - (b.gap ?? 0)) // Sort by gap size ascending
+      .map((item, index) => ({ ...item, rank: index + 1 })); // Add rank based on gap size
+  }, [filteredEquityGaps]); // Depend only on filteredEquityGaps
+  // Prepare data for Demographic Breakdown Chart
+  const demographicChartData = useMemo(() => {
+    // selectedMetricDisplayKey is Title Case, matching keys in mrpDemographics
+    if (
+      !selectedDemographicFactor ||
+      !selectedMetricDisplayKey ||
+      !rawData.mrpDemographics
+    )
+      return [];
+    const metricKeyInData = selectedMetricDisplayKey; // Use Title Case key
+    const levels = rawData.demographicOptions[selectedDemographicFactor] || [];
+    if (levels.length === 0) return [];
+    const chartData = levels.map((level) => {
+      const entry = { level };
+      modelsMeta.forEach((model) => {
+        // Access mrpDemographics using Title Case metric key
+        const score =
+          rawData.mrpDemographics[model.model]?.[selectedDemographicFactor]?.[
+            level
+          ]?.[metricKeyInData];
+        entry[model.model] =
+          score !== undefined && score !== null && score !== "N/A"
+            ? parseFloat(score)
+            : null;
+        entry[`${model.model}_color`] = model.color;
+      });
+      return entry;
+    });
+    return chartData.sort((a, b) => {
+      if (a.level === "N/A") return 1;
+      if (b.level === "N/A") return -1;
+      return a.level.localeCompare(b.level);
+    });
+  }, [
+    selectedDemographicFactor,
+    selectedMetricDisplayKey,
+    rawData.mrpDemographics,
+    rawData.demographicOptions,
+    modelsMeta,
+  ]);
+  const modelsWithDemoData = useMemo(
+    () =>
+      modelsMeta
+        .map((m) => m.model)
+        .filter((modelName) =>
+          demographicChartData.some(
+            (d) => d[modelName] !== null && d[modelName] !== undefined
+          )
+        ),
+    [modelsMeta, demographicChartData]
+  );
+  return (
+    <div>
+      {/* Controls Panel */}
+      <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+        <div className="px-4 py-3 bg-gray-50 border-b">
+          <h3 className="font-semibold text-gray-800">
+            Demographic Analysis Controls
+          </h3>
+        </div>
+        <div className="p-4 grid grid-cols-1 md:grid-cols-3 gap-4">
+          {/* Factor Selector */}
+          <div>
+            <label
+              htmlFor="factorSelect"
+              className="block text-sm font-medium text-gray-700 mb-1"
+            >
+              Demographic Factor
+            </label>
+            <select
+              id="factorSelect"
+              className="w-full border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+              value={selectedDemographicFactor || ""}
+              onChange={(e) => setSelectedDemographicFactor(e.target.value)}
+            >
+              <option value="" disabled>
+                Select factor
+              </option>
+              {Object.keys(rawData.demographicOptions || {})
+                .sort()
+                .map((factor) => (
+                  <option key={factor} value={factor}>
+                    {formatDisplayKey(factor)}
+                  </option>
+                ))}
+            </select>
+          </div>
+          {/* Level Toggle */}
+          <div>
+            <label className="block text-sm font-medium text-gray-700 mb-1">
+              Metric Level
+            </label>
+            <div className="flex">
+              <button
+                className={`px-3 py-2 text-sm font-medium border ${
+                  metricLevel === "high"
+                    ? "bg-blue-100 text-blue-800 border-blue-300"
+                    : "bg-white text-gray-700 border-gray-300 hover:bg-gray-50"
+                } rounded-l-md flex-1`}
+                onClick={() => setMetricLevel("high")}
+              >
+                High-Level
+              </button>
+              <button
+                className={`px-3 py-2 text-sm font-medium border-t border-b border-r ${
+                  metricLevel === "low"
+                    ? "bg-blue-100 text-blue-800 border-blue-300"
+                    : "bg-white text-gray-700 border-gray-300 hover:bg-gray-50"
+                } rounded-r-md flex-1`}
+                onClick={() => setMetricLevel("low")}
+              >
+                Low-Level
+              </button>
+            </div>
+          </div>
+          {/* Metric Selector - Uses Title Case keys */}
+          <div>
+            <label
+              htmlFor="metricSelect"
+              className="block text-sm font-medium text-gray-700 mb-1"
+            >
+              <Tooltip content={getMetricTooltip(selectedMetricDisplayKey)}>
+                <span>
+                  {metricLevel === "high"
+                    ? "High-Level Category"
+                    : "Low-Level Metric"}
+                </span>
+              </Tooltip>
+            </label>
+            <select
+              id="metricSelect"
+              className="w-full border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+              value={selectedMetricDisplayKey || ""}
+              onChange={(e) => setSelectedMetricDisplayKey(e.target.value)}
+              disabled={currentMetricDisplayKeys.length === 0}
+            >
+              <option value="" disabled>
+                Select metric
+              </option>
+              {/* Iterate through Title Case keys */}
+              {currentMetricDisplayKeys.map((displayKey) => (
+                <option key={displayKey} value={displayKey}>
+                  {displayKey}
+                </option>
+              ))}
+            </select>
+            {!selectedMetricDisplayKey &&
+              currentMetricDisplayKeys.length > 0 && (
+                <p className="mt-1 text-xs text-gray-500">
+                  Select a metric to view analysis.
+                </p>
+              )}
+            {currentMetricDisplayKeys.length === 0 && (
+              <p className="mt-1 text-xs text-amber-600">
+                No {metricLevel} metrics available.
+              </p>
+            )}
+          </div>
+        </div>
+      </div>
+      {/* Demographic Breakdown Chart */}
+      <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+        <div className="px-4 py-3 bg-gray-50 border-b">
+          <h3 className="font-semibold text-gray-800">
+            {selectedMetricDisplayKey || "Metric"} Scores across{" "}
+            {formatDisplayKey(selectedDemographicFactor) || "Groups"}
+            <InfoTooltip
+              text={`Shows the average score (0-100) for each model within each subgroup of ${formatDisplayKey(
+                selectedDemographicFactor
+              )}. Higher scores are better.`}
+            />
+          </h3>
+        </div>
+        <div className="p-4">
+          {demographicChartData.length > 0 && modelsWithDemoData.length > 0 ? (
+            <div className="h-80">
+              <ResponsiveContainer width="100%" height="100%">
+                <BarChart
+                  data={demographicChartData}
+                  margin={{ top: 5, right: 5, left: 0, bottom: 60 }}
+                >
+                  <CartesianGrid strokeDasharray="3 3" vertical={false} />
+                  <XAxis
+                    dataKey="level"
+                    angle={-45}
+                    textAnchor="end"
+                    tick={{ fontSize: 11 }}
+                    interval={0}
+                    height={70}
+                  />
+                  <YAxis domain={[0, 100]} tick={{ fontSize: 11 }} width={40} />
+                  <RechartsTooltip
+                    content={<CustomDemographicTooltip />}
+                    wrapperStyle={{ zIndex: 10 }}
+                  />
+                  <Legend
+                    layout="horizontal"
+                    verticalAlign="bottom"
+                    align="center"
+                    wrapperStyle={{ paddingTop: 30 }}
+                    iconSize={10}
+                  />
+                  {modelsWithDemoData.map((modelName) => (
+                    <Bar
+                      key={modelName}
+                      dataKey={modelName}
+                      name={modelName}
+                      fill={getModelColor(modelName)}
+                    />
+                  ))}
+                </BarChart>
+              </ResponsiveContainer>
+            </div>
+          ) : (
+            <div className="flex items-center justify-center h-60 bg-gray-50 rounded">
+              <div className="text-center p-4">
+                <svg
+                  xmlns="http://www.w3.org/2000/svg"
+                  className="h-10 w-10 mx-auto text-gray-400 mb-3"
+                  fill="none"
+                  viewBox="0 0 24 24"
+                  stroke="currentColor"
+                >
+                  <path
+                    strokeLinecap="round"
+                    strokeLinejoin="round"
+                    strokeWidth={2}
+                    d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V7a2 2 0 012-2h2l2-3h6l2 3h2a2 2 0 012 2v10a2 2 0 01-2 2h-1"
+                  />
+                </svg>
+                <h3 className="text-lg font-medium text-gray-900 mb-1">
+                  No Data Available
+                </h3>
+                <p className="text-sm text-gray-600">
+                  {!selectedDemographicFactor
+                    ? "Please select a demographic factor."
+                    : !selectedMetricDisplayKey
+                    ? "Please select a metric."
+                    : "No score data found."}
+                </p>
+              </div>
+            </div>
+          )}
+        </div>
+      </div>
+      {/* Equity Gap Comparison Chart */}
+      <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+        <div className="px-4 py-3 bg-gray-50 border-b">
+          <h3 className="font-semibold text-gray-800">
+            Equity Gap Comparison for {selectedMetricDisplayKey || "Metric"}
+            <InfoTooltip
+              text={`Compares the maximum score difference observed between ${formatDisplayKey(
+                selectedDemographicFactor
+              )} groups for each model. Lower gaps indicate better equity.`}
+            />
+          </h3>
+        </div>
+        <div className="p-4">
+          {equityGapChartData.length > 0 ? (
+            <div className="h-72">
+              <ResponsiveContainer width="100%" height="100%">
+                <BarChart
+                  data={equityGapChartData}
+                  margin={{ top: 5, right: 30, left: 5, bottom: 5 }}
+                  layout="vertical"
+                >
+                  <CartesianGrid
+                    strokeDasharray="3 3"
+                    horizontal={true}
+                    vertical={false}
+                  />
+                  <XAxis
+                    type="number"
+                    dataKey="gap"
+                    domain={[0, "auto"]}
+                    tick={{ fontSize: 11 }}
+                    allowDecimals={false}
+                  />
+                  <YAxis
+                    dataKey="model"
+                    type="category"
+                    width={130}
+                    tick={{ fontSize: 11 }}
+                  />
+                  <RechartsTooltip
+                    content={<EquityGapTooltip />}
+                    wrapperStyle={{ zIndex: 10 }}
+                  />
+                  <Bar
+                    dataKey="gap"
+                    name="Equity Gap"
+                    barSize={20}
+                    radius={[0, 4, 4, 0]}
+                  >
+                    {equityGapChartData.map((entry, index) => (
+                      <Cell
+                        key={`cell-${index}`}
+                        fill={entry.color}
+                        fillOpacity={0.8}
+                      />
+                    ))}
+                    <LabelList
+                      dataKey="gap"
+                      position="right"
+                      formatter={(value) => value?.toFixed(1) ?? ""}
+                      style={{ fontSize: 11, fill: "#6b7280" }}
+                    />
+                  </Bar>
+                </BarChart>
+              </ResponsiveContainer>
+            </div>
+          ) : (
+            <div className="flex items-center justify-center h-60 bg-gray-50 rounded">
+              <div className="text-center p-4">
+                <svg
+                  xmlns="http://www.w3.org/2000/svg"
+                  className="h-10 w-10 mx-auto text-gray-400 mb-3"
+                  fill="none"
+                  viewBox="0 0 24 24"
+                  stroke="currentColor"
+                >
+                  <path
+                    strokeLinecap="round"
+                    strokeLinejoin="round"
+                    strokeWidth={2}
+                    d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V7a2 2 0 012-2h2l2-3h6l2 3h2a2 2 0 012 2v10a2 2 0 01-2 2h-1"
+                  />
+                </svg>
+                <h3 className="text-lg font-medium text-gray-900 mb-1">
+                  No Equity Gap Data
+                </h3>
+                <p className="text-sm text-gray-600">
+                  {!selectedDemographicFactor
+                    ? "Select factor."
+                    : !selectedMetricDisplayKey
+                    ? "Select metric."
+                    : "No equity gaps found."}
+                </p>
+              </div>
+            </div>
+          )}
+          {equityGapChartData.length > 0 && (
+            <p className="mt-3 text-xs text-gray-500">
+              Chart ranks models by equity gap size (lower is better).
+            </p>
+          )}
+        </div>
+      </div>
+      {/* Equity Gap Details Table - IMPROVED */}
+      {equityGapChartData.length > 0 && (
+        <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+          <div className="px-4 py-3 bg-gray-50 border-b">
+            <h3 className="font-semibold text-gray-800">
+              Detailed Equity Gaps: {selectedMetricDisplayKey || "Metric"} by{" "}
+              {formatDisplayKey(selectedDemographicFactor) || "Factor"}
+            </h3>
+          </div>
+          <div className="p-4 overflow-x-auto">
+            <table className="min-w-full divide-y divide-gray-200">
+              <thead className="bg-gray-50">
+                <tr>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Rank
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Model
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Equity Gap
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Effect Size
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Significance
+                  </th>
+                  <th className="px-3 py-2 text-center text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Concern?
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Lowest Group (Score)
+                  </th>
+                  <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
+                    Highest Group (Score)
+                  </th>
+                </tr>
+              </thead>
+              <tbody className="bg-white divide-y divide-gray-200">
+                {equityGapChartData.map((gap) => {
+                  const minScoreDisplay =
+                    typeof gap.min_score === "number"
+                      ? gap.min_score.toFixed(1)
+                      : "-";
+                  const maxScoreDisplay =
+                    typeof gap.max_score === "number"
+                      ? gap.max_score.toFixed(1)
+                      : "-";
+                  return (
+                    <tr
+                      key={gap.model}
+                      className={`hover:bg-gray-50 ${
+                        gap.is_equity_concern ? "bg-red-50" : ""
+                      }`}
+                    >
+                      <td className="px-3 py-2 whitespace-nowrap text-sm text-gray-500">
+                        {gap.rank}
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap">
+                        <div className="flex items-center">
+                          <div
+                            className="w-3 h-3 rounded-full mr-2 flex-shrink-0"
+                            style={{ backgroundColor: gap.color }}
+                          ></div>
+                          <span className="text-sm font-medium text-gray-900">
+                            {gap.model}
+                          </span>
+                        </div>
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm font-medium">
+                        {/* Equity Gap as plain text */}
+                        {gap.gap !== undefined && gap.gap !== null
+                          ? gap.gap.toFixed(1)
+                          : "N/A"}
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm">
+                        {gap.effect_size !== undefined &&
+                        gap.effect_size !== null ? (
+                          <div className="flex items-center">
+                            <span
+                              className={`px-2 py-0.5 rounded-full text-xs font-medium ${getEffectSizeBadgeStyle(
+                                gap.effect_size_class
+                              )}`}
+                            >
+                              {gap.effect_size_class || "N/A"}
+                            </span>
+                            <InfoTooltip
+                              text={getEffectSizeTooltip(gap.effect_size)}
+                            />
+                          </div>
+                        ) : (
+                          <span className="text-gray-500">N/A</span>
+                        )}
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm">
+                        <div className="flex flex-col">
+                          <div className="flex items-center">
+                            <span
+                              className={`px-2 py-0.5 rounded-full text-xs font-medium ${getSignificanceBadgeStyle(
+                                gap.is_statistically_significant
+                              )}`}
+                            >
+                              {gap.is_statistically_significant ? (
+                                <span>Significant ✔</span>
+                              ) : (
+                                <span>Not Significant ✘</span>
+                              )}
+                            </span>
+                          </div>
+                          <div className="text-xs text-gray-500 mt-1">
+                            {gap.p_value !== undefined && gap.p_value !== null
+                              ? formatPValue(gap.p_value)
+                              : ""}
+                          </div>
+                        </div>
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm text-center">
+                        <span
+                          className={`inline-block px-2 py-0.5 rounded-full text-xs font-medium ${getConcernBadgeStyle(
+                            gap.is_equity_concern
+                          )}`}
+                        >
+                          {gap.is_equity_concern ? "Yes" : "No"}
+                        </span>
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm">
+                        {gap.min_level ? (
+                          <div className="flex flex-col">
+                            <span className="font-medium">{gap.min_level}</span>
+                            <span className="text-gray-500">
+                              {minScoreDisplay}
+                            </span>
+                          </div>
+                        ) : (
+                          <span className="text-gray-500">-</span>
+                        )}
+                      </td>
+                      <td className="px-3 py-2 whitespace-nowrap text-sm">
+                        {gap.max_level ? (
+                          <div className="flex flex-col">
+                            <span className="font-medium">{gap.max_level}</span>
+                            <span className="text-gray-500">
+                              {maxScoreDisplay}
+                            </span>
+                          </div>
+                        ) : (
+                          <span className="text-gray-500">-</span>
+                        )}
+                      </td>
+                    </tr>
+                  );
+                })}
+              </tbody>
+            </table>
+          </div>
+          {/* Table Footer/Explanation - IMPROVED */}
+          <div className="px-4 pb-4 pt-2 text-xs text-gray-600">
+            <div className="space-y-1">
+              <p>
+                <span className="font-semibold">Rank:</span> Based on lowest
+                Equity Gap value for this metric/factor
+              </p>
+              <p>
+                <span className="font-semibold">Equity Gap:</span> Score
+                difference (0-100 points) between highest and lowest scoring
+                groups
+              </p>
+              <p>
+                <span className="font-semibold">Effect Size:</span> Gap
+                magnitude relative to score variation (hover for details)
+              </p>
+              <p>
+                <span className="font-semibold">Significance:</span>Whether the
+                gap is statistically significant after adjusting for multiple
+                tests (Benjamini-Hochberg FDR correction, q&lt;0.05)
+              </p>
+              <p>
+                <span className="font-semibold">Concern?:</span> 'Yes' flags
+                potential equity concerns (Large Effect Size AND Statistically
+                Significant)
+              </p>
+            </div>
+          </div>
+        </div>
+      )}
+    </div>
+  );
+};
+export default DemographicAnalysis;

leaderboard-app/components/LLMComparisonDashboard.jsx ADDED Viewed

	@@ -0,0 +1,639 @@

+// components/LLMComparisonDashboard.jsx
+"use client";
+import React, { useState, useMemo } from "react";
+import {
+  getScoreBadgeColor,
+  formatDisplayKey, // Use this for displaying snake_case keys nicely
+  getMetricTooltip,
+  getEquityIndicatorStyle, // Use this for Max Equity Gap status
+} from "../lib/utils"; // Adjust path as needed
+import TaskPerformance from "./TaskPerformance";
+import DemographicAnalysis from "./DemographicAnalysis";
+import MetricsBreakdown from "./MetricsBreakdown";
+import About from "./About";
+import { Tooltip } from "./Tooltip"; // Assuming this is your Tooltip component
+// Helper component for info tooltips (assuming it exists and works)
+const InfoTooltip = ({ text }) => {
+  const [isVisible, setIsVisible] = useState(false);
+  return (
+    <div className="relative inline-block ml-1 align-middle">
+      <button
+        className="text-gray-400 hover:text-gray-600 focus:outline-none"
+        onMouseEnter={() => setIsVisible(true)}
+        onMouseLeave={() => setIsVisible(false)}
+        onClick={(e) => {
+          e.stopPropagation();
+          setIsVisible(!isVisible);
+        }}
+        aria-label="Info"
+      >
+        <svg
+          xmlns="http://www.w3.org/2000/svg"
+          className="h-4 w-4"
+          viewBox="0 0 20 20"
+          fill="currentColor"
+        >
+          <path
+            fillRule="evenodd"
+            d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7-4a1 1 0 11-2 0 1 1 0 012 0zM9 9a1 1 0 000 2v3a1 1 0 001 1h1a1 1 0 100-2v-3a1 1 0 00-1-1H9z"
+            clipRule="evenodd"
+          />
+        </svg>
+      </button>
+      {isVisible && (
+        <div className="absolute z-10 w-64 p-2 bg-white border rounded shadow-lg text-xs text-gray-700 -translate-x-1/2 left-1/2 mt-1 normal-case">
+          {text}
+        </div>
+      )}
+    </div>
+  );
+};
+// Main dashboard component
+const LLMComparisonDashboard = ({ data: processedData }) => {
+  const [activeTab, setActiveTab] = useState("overview");
+  const [topPerformersView, setTopPerformersView] = useState("high-level");
+  // Destructure data - top-level keys are camelCase
+  // Nested rawData and equityAnalysis retain original snake_case keys
+  const {
+    models: rankedModels = [], // This is overallRankingProcessed with camelCase keys
+    metricsData = { highLevelCategories: {}, lowLevelMetrics: {} }, // Title Case keys inside
+    radarData = [],
+    overviewCardData = {}, // camelCase keys inside expected
+    rawData = {
+      // camelCase keys for objects, snake_case keys inside those objects
+      taskLevelPerformance: {},
+      mrpDemographics: {},
+      demographicOptions: {},
+      availableMetrics: [], // Title Case
+      tasks: [],
+      taskCategories: {},
+      taskMetrics: [], // Title Case
+      taskMetricsSnake: [], // snake_case
+      taskCategoryMap: {},
+    },
+    bestPerCategory = {}, // Title Case keys
+    bestPerMetric = {}, // Title Case keys
+    equityAnalysis = {
+      // Original snake_case keys
+      all_equity_gaps: [],
+      model_max_effect_gaps: {},
+      universal_issues: [],
+      assessment_method: {},
+      demographic_variation_stats: {},
+    },
+    metadata = {}, // Original keys
+  } = processedData || {};
+  // NEW: Helper function to get color for Max Equity Gap bubble
+  const getEquityGapBadgeColor = (model) => {
+    const isConcern = model.maxEffectConcernFlag;
+    const isSignificant = model.maxEffectSignificant;
+    const effectSizeClass = model.maxEffectSizeClass;
+    const isLargeEffect = effectSizeClass === "Large";
+    if (isConcern && isSignificant && isLargeEffect) {
+      return "bg-red-100 text-red-800"; // Equity Concern
+    }
+    if (isLargeEffect) {
+      return "bg-yellow-100 text-yellow-800"; // Large Effect
+    }
+    if (isSignificant) {
+      return "bg-blue-100 text-blue-800"; // Significant
+    }
+    return "bg-gray-100 text-gray-800"; // No concern
+  };
+  // UPDATED: Render cell for Max Equity Gap column with bubble design
+  const renderMaxEquityGapCell = (model) => {
+    // model object has camelCase keys
+    const gapValue = model.maxEffectGap;
+    const isConcern = model.maxEffectConcernFlag;
+    const significanceStatus = model.maxEffectSignificant;
+    const pValue = model.maxEffectPValue;
+    const effectSizeClass = model.maxEffectSizeClass;
+    const isLargeEffect = effectSizeClass === "Large";
+    // Access nested details using original snake_case keys
+    const gapDetails = model.maxEffectGapDetails || {};
+    const ciLower = gapDetails.gap_confidence_interval_95_lower;
+    const ciUpper = gapDetails.gap_confidence_interval_95_upper;
+    const displayValue =
+      typeof gapValue === "number" ? gapValue.toFixed(1) : "N/A";
+    if (displayValue === "N/A")
+      return <span className="text-xs text-gray-500">N/A</span>;
+    const indicator = getEquityIndicatorStyle(
+      isConcern,
+      isLargeEffect,
+      significanceStatus,
+      pValue,
+      effectSizeClass
+    );
+    let fullTooltipContent = indicator.tooltip;
+    if (typeof ciLower === "number" && typeof ciUpper === "number") {
+      fullTooltipContent += `\n95% CI: [${ciLower.toFixed(
+        1
+      )}, ${ciUpper.toFixed(1)}]`;
+    } else {
+      fullTooltipContent += `\n95% CI: N/A`;
+    }
+    return (
+      <Tooltip
+        content={
+          <div className="whitespace-pre-line">{fullTooltipContent}</div>
+        }
+      >
+        <span
+          className={`px-2 py-0.5 rounded-full text-xs font-medium ${getEquityGapBadgeColor(
+            model
+          )}`}
+        >
+          {displayValue}
+        </span>
+      </Tooltip>
+    );
+  };
+  // NEW: Helper for equity concerns percentage badge color
+  const getEquityConcernBadgeColor = (percentage) => {
+    if (percentage === null || percentage === undefined)
+      return "bg-gray-100 text-gray-800";
+    if (percentage === 0) return "bg-green-100 text-green-800";
+    if (percentage <= 2.5) return "bg-blue-100 text-blue-800";
+    if (percentage <= 5) return "bg-yellow-100 text-yellow-800";
+    return "bg-red-100 text-red-800";
+  };
+  return (
+    <div className="max-w-7xl mx-auto p-4 bg-white">
+      {/* Header */}
+      <div className="relative mb-6 overflow-hidden">
+        <div className="absolute inset-0 bg-gradient-to-br from-blue-50 to-sky-50 opacity-70"></div>
+        <div className="relative max-w-5xl mx-auto px-6 py-6">
+          <div className="text-center">
+            <h1 className="text-4xl font-bold mb-2 tracking-narrow text-blue-700">
+              Prolific's AI User Experience Leaderboard
+            </h1>
+            <p className="text-gray-600 max-w-4xl mx-auto">
+              A benchmark assessing how well language models handle real-world
+              tasks based on user experiences.
+            </p>
+          </div>
+        </div>
+      </div>
+      {/* Tab Buttons */}
+      <div className="flex flex-wrap mb-6 border-b">
+        {[
+          "overview",
+          "metrics-breakdown",
+          "task-performance",
+          "demographic-analysis",
+          "about",
+        ].map((tab) => (
+          <button
+            key={tab}
+            className={`px-4 py-2 font-medium capitalize ${
+              activeTab === tab
+                ? "text-blue-600 border-b-2 border-blue-600"
+                : "text-gray-500 hover:text-gray-700"
+            }`}
+            onClick={() => setActiveTab(tab)}
+          >
+            {" "}
+            {tab.replace("-", " ")}{" "}
+          </button>
+        ))}
+      </div>
+      {/* Overview Tab */}
+      {activeTab === "overview" && (
+        <div>
+          {/* Overall Rankings Card */}
+          <div className="mb-6 border rounded-lg overflow-hidden shadow-sm">
+            <div className="px-4 py-3 bg-gray-50 border-b">
+              <h2 className="text-xl font-semibold text-gray-800">
+                Overall Model Rankings
+              </h2>
+            </div>
+            <div className="p-4">
+              <div className="overflow-x-auto">
+                <table className="w-full min-w-[850px] table-auto divide-y divide-gray-200">
+                  <thead>
+                    <tr className="bg-gray-50">
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-12">
+                        Rank
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-48">
+                        Model
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-28">
+                        <span>Overall Score</span>
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-24">
+                        <span>Overall SD</span>
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-32">
+                        <span>Max Equity Gap</span>
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-38">
+                        <span>Max Gap Area</span>
+                      </th>
+                      <th className="px-3 py-2 text-center text-xs font-medium text-gray-500 uppercase tracking-wider w-36">
+                        <span>Equity Concerns</span>
+                      </th>
+                      <th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider w-32">
+                        <span>User Retention</span>
+                      </th>
+                    </tr>
+                  </thead>
+                  <tbody className="divide-y divide-gray-200">
+                    {/* Use camelCase model object from rankedModels */}
+                    {rankedModels.map((model) => (
+                      <tr key={model.model} className="hover:bg-gray-50">
+                        <td className="px-3 py-3 text-sm font-medium text-gray-900">
+                          {model.rank}
+                        </td>
+                        <td className="px-3 py-3">
+                          <div className="flex items-center">
+                            <div
+                              className="w-3 h-3 rounded-full mr-2 flex-shrink-0"
+                              style={{ backgroundColor: model.color }}
+                            ></div>
+                            <span className="text-sm font-medium text-gray-900">
+                              {model.model}
+                            </span>
+                          </div>
+                        </td>
+                        <td className="px-3 py-3 text-sm font-semibold text-gray-800">
+                          {model.overallScore !== null
+                            ? model.overallScore.toFixed(1)
+                            : "N/A"}
+                        </td>
+                        <td className="px-3 py-3 text-sm text-gray-600">
+                          {model.stdDevAcrossCats !== "N/A" &&
+                          model.stdDevAcrossCats !== null
+                            ? `± ${Number(model.stdDevAcrossCats).toFixed(1)}`
+                            : "N/A"}
+                        </td>
+                        <td className="px-3 py-3 text-sm">
+                          {renderMaxEquityGapCell(model)}
+                        </td>
+                        <td className="px-3 py-3">
+                          {model.maxEffectFactor &&
+                          model.maxEffectFactor !== "N/A" ? (
+                            <div className="flex flex-col">
+                              <span className="text-xs font-medium text-gray-900">
+                                {formatDisplayKey(model.maxEffectFactor)}
+                              </span>
+                              <span className="text-xs text-gray-500">
+                                {formatDisplayKey(model.maxEffectCategory)}
+                              </span>
+                            </div>
+                          ) : (
+                            <span className="text-xs text-gray-500">N/A</span>
+                          )}
+                        </td>
+                        <td className="px-3 py-3 text-sm text-center">
+                          {model.equityConcernPercentage !== null ? (
+                            <span>
+                              {model.equityConcernPercentage.toFixed(1)}%
+                            </span>
+                          ) : (
+                            <span className="text-xs text-gray-500">N/A</span>
+                          )}
+                        </td>
+                        <td className="px-3 py-3 text-sm">
+                          {model.repeatUsageScore !== null ? (
+                            <span
+                              className={`px-2 py-0.5 rounded-full text-xs font-medium ${getScoreBadgeColor(
+                                model.repeatUsageScore
+                              )}`}
+                            >
+                              {model.repeatUsageScore.toFixed(1)}%
+                            </span>
+                          ) : (
+                            <span className="text-xs text-gray-500">N/A</span>
+                          )}
+                        </td>
+                      </tr>
+                    ))}
+                  </tbody>
+                </table>
+              </div>
+              {/* UPDATED: Vertical list for column descriptions with detailed info */}
+              <div className="mt-4 pt-3 border-t border-gray-200 text-xs text-gray-600">
+                {/* Column descriptions in vertical list */}
+                <div className="mb-2">
+                  <div>
+                    <span className="font-semibold">Overall Score:</span> Avg.
+                    score across high-level categories
+                  </div>
+                  <div>
+                    <span className="font-semibold">Overall SD:</span> Standard
+                    deviation across high-level categories (lower = more
+                    consistent)
+                  </div>
+                  <div>
+                    <span className="font-semibold">Max Equity Gap:</span>{" "}
+                    Largest demographic score difference (hover for details on
+                    significance and effect size)
+                  </div>
+                  <div>
+                    <span className="font-semibold">Max Gap Area:</span>{" "}
+                    Demographic group and Category where the Max Equity Gap
+                    occurs
+                  </div>
+                  <div>
+                    <span className="font-semibold">Equity Concerns:</span>{" "}
+                    Percentage of demographic gaps flagged as concerns (large
+                    effect & statistically significant)
+                  </div>
+                  <div>
+                    <span className="font-semibold">User Retention:</span>{" "}
+                    Percentage of participants who said they would use the model
+                    again
+                  </div>
+                </div>
+                {/* Color key on a single line */}
+                <div className="mt-2 pt-2 border-t border-gray-100 flex flex-wrap items-center gap-x-4 gap-y-2">
+                  <span className="font-semibold whitespace-nowrap">
+                    Color Key:
+                  </span>
+                  <div className="flex items-center">
+                    <span className="inline-block w-4 h-4 rounded-full bg-red-100 mr-1"></span>
+                    <span>
+                      Equity Concern (Large Effect & Statistically Significant)
+                    </span>
+                  </div>
+                  <div className="flex items-center">
+                    <span className="inline-block w-4 h-4 rounded-full bg-yellow-100 mr-1"></span>
+                    <span>Large Effect (Not Statistically Significant)</span>
+                  </div>
+                </div>
+              </div>
+            </div>
+          </div>
+          {/* Top Performers Section */}
+          <div className="mb-4 flex items-center">
+            <h3 className="font-semibold text-xl mr-4">
+              Top Performers by Category
+            </h3>
+            <div className="flex space-x-1 p-1 bg-gray-200 rounded-lg">
+              <button
+                className={`px-4 py-1.5 text-sm font-medium rounded-md transition-colors duration-150 ${
+                  topPerformersView === "high-level"
+                    ? "bg-white shadow text-blue-600"
+                    : "text-gray-600 hover:text-gray-800"
+                }`}
+                onClick={() => setTopPerformersView("high-level")}
+              >
+                {" "}
+                High-Level Categories{" "}
+              </button>
+              <button
+                className={`px-4 py-1.5 text-sm font-medium rounded-md transition-colors duration-150 ${
+                  topPerformersView === "low-level"
+                    ? "bg-white shadow text-blue-600"
+                    : "text-gray-600 hover:text-gray-800"
+                }`}
+                onClick={() => setTopPerformersView("low-level")}
+              >
+                {" "}
+                Low-Level Metrics{" "}
+              </button>
+            </div>
+          </div>
+          {/* Top Performers Tables - Access using Title Case keys */}
+          {topPerformersView === "high-level" && (
+            <div className="border rounded-lg overflow-hidden shadow-sm mb-6">
+              <div className="px-4 py-3 bg-gray-50 border-b">
+                <h3 className="font-semibold text-gray-800">
+                  Top Performers by High-Level Category
+                </h3>
+              </div>
+              <div className="p-4">
+                {Object.entries(bestPerCategory || {}).length > 0 ? (
+                  <table className="min-w-full divide-y divide-gray-200">
+                    <thead>
+                      <tr>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Category
+                        </th>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Best Model
+                        </th>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Score
+                        </th>
+                      </tr>
+                    </thead>
+                    <tbody className="bg-white divide-y divide-gray-200">
+                      {Object.entries(bestPerCategory)
+                        .sort(([a], [b]) => a.localeCompare(b))
+                        .map(([catDisplayKey, bestInfo], idx) => (
+                          <tr
+                            key={catDisplayKey}
+                            className={
+                              idx % 2 === 0 ? "bg-white" : "bg-gray-50"
+                            }
+                          >
+                            <td className="px-3 py-2 font-medium text-sm text-gray-900">
+                              <Tooltip
+                                content={getMetricTooltip(catDisplayKey)}
+                              >
+                                <span>{catDisplayKey}</span>
+                              </Tooltip>
+                            </td>
+                            <td className="px-3 py-2">
+                              {bestInfo.model !== "N/A" ? (
+                                <div className="flex items-center">
+                                  <div
+                                    className="w-3 h-3 rounded-full mr-2 shrink-0"
+                                    style={{ backgroundColor: bestInfo.color }}
+                                  ></div>
+                                  <span className="text-sm">
+                                    {bestInfo.model}
+                                  </span>
+                                </div>
+                              ) : (
+                                <span className="text-sm text-gray-500">
+                                  N/A
+                                </span>
+                              )}
+                            </td>
+                            <td className="px-3 py-2">
+                              {bestInfo.score !== null ? (
+                                <span
+                                  className={`px-2 py-0.5 rounded-full text-xs font-medium ${getScoreBadgeColor(
+                                    bestInfo.score
+                                  )}`}
+                                >
+                                  {bestInfo.score.toFixed(1)}
+                                </span>
+                              ) : (
+                                <span className="text-sm text-gray-500">
+                                  N/A
+                                </span>
+                              )}
+                            </td>
+                          </tr>
+                        ))}
+                    </tbody>
+                  </table>
+                ) : (
+                  <p className="text-center text-gray-500 py-4">
+                    Top performer data not available.
+                  </p>
+                )}
+                <p className="text-xs text-gray-500 mt-2">
+                  Scores based on user ratings, normalized to 0-100.
+                </p>
+              </div>
+            </div>
+          )}
+          {topPerformersView === "low-level" && (
+            <div className="border rounded-lg overflow-hidden shadow-sm mb-6">
+              <div className="px-4 py-3 bg-gray-50 border-b">
+                <h3 className="font-semibold text-gray-800">
+                  Top Performers by Low-Level Metric
+                </h3>
+              </div>
+              <div className="p-4">
+                {Object.entries(bestPerMetric || {}).length > 0 ? (
+                  <table className="min-w-full divide-y divide-gray-200">
+                    <thead>
+                      <tr>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Metric
+                        </th>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Best Model
+                        </th>
+                        <th
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"
+                        >
+                          Score
+                        </th>
+                      </tr>
+                    </thead>
+                    <tbody className="bg-white divide-y divide-gray-200">
+                      {Object.entries(bestPerMetric)
+                        .sort(([a], [b]) => a.localeCompare(b))
+                        .map(([metricDisplayKey, bestInfo], idx) => (
+                          <tr
+                            key={metricDisplayKey}
+                            className={
+                              idx % 2 === 0 ? "bg-white" : "bg-gray-50"
+                            }
+                          >
+                            <td className="px-3 py-2 font-medium text-sm text-gray-900">
+                              <Tooltip
+                                content={getMetricTooltip(metricDisplayKey)}
+                              >
+                                <span>{metricDisplayKey}</span>
+                              </Tooltip>
+                            </td>
+                            <td className="px-3 py-2">
+                              {bestInfo.model !== "N/A" ? (
+                                <div className="flex items-center">
+                                  <div
+                                    className="w-3 h-3 rounded-full mr-2 shrink-0"
+                                    style={{ backgroundColor: bestInfo.color }}
+                                  ></div>
+                                  <span className="text-sm">
+                                    {bestInfo.model}
+                                  </span>
+                                </div>
+                              ) : (
+                                <span className="text-sm text-gray-500">
+                                  N/A
+                                </span>
+                              )}
+                            </td>
+                            <td className="px-3 py-2">
+                              {bestInfo.score !== null ? (
+                                <span
+                                  className={`px-2 py-0.5 rounded-full text-xs font-medium ${getScoreBadgeColor(
+                                    bestInfo.score
+                                  )}`}
+                                >
+                                  {bestInfo.score.toFixed(1)}
+                                </span>
+                              ) : (
+                                <span className="text-sm text-gray-500">
+                                  N/A
+                                </span>
+                              )}
+                            </td>
+                          </tr>
+                        ))}
+                    </tbody>
+                  </table>
+                ) : (
+                  <p className="text-center text-gray-500 py-4">
+                    Low-level metric top performer data not available.
+                  </p>
+                )}
+                <p className="text-xs text-gray-500 mt-2">
+                  Scores based on user ratings, normalized to 0-100.
+                </p>
+              </div>
+            </div>
+          )}
+        </div>
+      )}{" "}
+      {/* End Overview Tab */}
+      {/* Other Tabs - Pass Correct Props */}
+      {activeTab === "metrics-breakdown" && (
+        <MetricsBreakdown
+          metricsData={metricsData} // Title Case keys inside, plus internalMetricKey
+          modelsMeta={rankedModels} // camelCase keys inside
+          radarData={radarData}
+        />
+      )}
+      {activeTab === "task-performance" && (
+        <TaskPerformance
+          rawData={rawData} // Contains camelCase top-level, snake_case nested
+          modelsMeta={rankedModels}
+          metricsData={metricsData} // Title Case keys inside, plus internalMetricKey
+          overviewCardData={overviewCardData}
+        />
+      )}
+      {activeTab === "demographic-analysis" && (
+        <DemographicAnalysis
+          rawData={rawData} // Contains camelCase top-level, snake_case/Title Case nested
+          modelsMeta={rankedModels}
+          metricsData={metricsData} // Title Case keys inside, plus internalMetricKey
+          equityAnalysis={equityAnalysis} // Original snake_case structure
+        />
+      )}
+      {activeTab === "about" && <About metadata={metadata} />}
+    </div>
+  );
+};
+export default LLMComparisonDashboard;

leaderboard-app/components/MetricsBreakdown.jsx ADDED Viewed

	@@ -0,0 +1,447 @@

+// components/MetricsBreakdown.jsx
+"use client";
+import React, { useState, useEffect, useMemo } from "react";
+import {
+  RadarChart,
+  PolarGrid,
+  PolarAngleAxis,
+  PolarRadiusAxis,
+  Radar,
+  Tooltip as RechartsTooltip, // Renamed to avoid conflict with local Tooltip
+  Legend,
+  ResponsiveContainer,
+} from "recharts";
+import { getScoreColor, getMetricTooltip } from "../lib/utils";
+import { Tooltip } from "./Tooltip"; // Your custom Tooltip component for headers etc.
+// Component receives processed metrics data, model metadata, and category radar data
+const MetricsBreakdown = ({
+  metricsData,
+  modelsMeta,
+  radarData: categoryRadarDataProp, // Already processed radar data for categories
+}) => {
+  const [subTab, setSubTab] = useState("categories"); // 'categories' or 'metrics'
+  const [selectedModels, setSelectedModels] = useState([]);
+  // console.log("Metrics Data in Breakdown:", metricsData); // For debugging
+  // console.log("Models Meta in Breakdown:", modelsMeta);
+  // console.log("Category Radar Data Prop:", categoryRadarDataProp);
+  // Extract data from props with defaults
+  const { highLevelCategories, lowLevelMetrics } = metricsData || {
+    highLevelCategories: {},
+    lowLevelMetrics: {},
+  };
+  // Use modelsMeta directly for clarity, aliasing if preferred
+  const models = modelsMeta || [];
+  // Get sorted lists of category and metric names
+  const sortedCategoryNames = useMemo(
+    () =>
+      Object.keys(highLevelCategories || {}).sort((a, b) => a.localeCompare(b)),
+    [highLevelCategories]
+  );
+  const sortedMetricNames = useMemo(
+    () => Object.keys(lowLevelMetrics || {}).sort((a, b) => a.localeCompare(b)),
+    [lowLevelMetrics]
+  );
+  // Initialize selections
+  useEffect(() => {
+    if (selectedModels.length === 0 && models.length > 0) {
+      setSelectedModels(models.map((m) => m.model));
+    }
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [models]); // Only depends on models changing/loading
+  // --- Memoized data generation functions ---
+  // Radar data for LL Metrics (used when subTab === 'metrics') - CORRECTED ACCESSORS
+  const metricRadarData = useMemo(() => {
+    if (
+      !lowLevelMetrics ||
+      models.length === 0 ||
+      sortedMetricNames.length === 0
+    )
+      return [];
+    return sortedMetricNames.map((metricName) => {
+      const entry = { category: metricName }; // Use metric name as the axis category
+      const metricData = lowLevelMetrics[metricName];
+      if (metricData) {
+        models
+          .filter((m) => selectedModels.includes(m.model))
+          .forEach((model) => {
+            // Use correct camelCase keys
+            entry[model.model] =
+              Number(metricData.modelScores?.[model.model]?.nationalScore) || 0;
+            // Standard deviation per metric is NOT available, so we don't add it here
+          });
+      }
+      return entry;
+    });
+  }, [lowLevelMetrics, models, selectedModels, sortedMetricNames]);
+  // Custom tooltip (common for both radar charts) - CORRECTED (removed std dev logic)
+  const CustomRadarTooltip = ({ active, payload, label }) => {
+    if (active && payload && payload.length) {
+      return (
+        <div className="bg-white p-3 border rounded shadow-lg max-w-xs opacity-95">
+          <p className="font-medium mb-1 text-gray-800">{label}</p>
+          {/* Get tooltip description for the category/metric itself */}
+          <p className="text-xs mb-3 text-gray-600 border-b pb-2">
+            {getMetricTooltip(label)}
+          </p>
+          <div className="space-y-1">
+            {payload
+              // Sort models by score within tooltip
+              .sort((a, b) => (b.value || 0) - (a.value || 0))
+              .map((entry) => (
+                <div
+                  key={entry.dataKey} // dataKey is the model name here
+                  className="flex items-center text-sm"
+                >
+                  <div
+                    className="w-2.5 h-2.5 rounded-full mr-2 flex-shrink-0"
+                    style={{ backgroundColor: entry.color || "#8884d8" }}
+                  ></div>
+                  <span className="mr-1 truncate flex-grow text-gray-700">
+                    {entry.name}: {/* name is also the model name */}
+                  </span>
+                  <span className="font-medium flex-shrink-0 text-gray-900">
+                    {/* Ensure value exists and format */}
+                    {entry.value !== null && entry.value !== undefined
+                      ? Number(entry.value).toFixed(1)
+                      : "N/A"}
+                    {/* Removed standard deviation display */}
+                  </span>
+                </div>
+              ))}
+          </div>
+        </div>
+      );
+    }
+    return null;
+  };
+  // Use the radar data passed via prop for categories view, filtered by selected models - CORRECTED (removed std dev logic)
+  const filteredCategoryRadarData = useMemo(() => {
+    if (!categoryRadarDataProp || models.length === 0) return [];
+    // Filter based on selected models, removing std dev keys
+    return categoryRadarDataProp.map((item) => {
+      const newItem = { category: item.category };
+      models
+        .filter((m) => selectedModels.includes(m.model))
+        .forEach((model) => {
+          // We only need the model score itself for the radar data
+          newItem[model.model] = item[model.model] ?? 0; // Use nullish coalescing for default
+        });
+      return newItem;
+    });
+  }, [categoryRadarDataProp, models, selectedModels]);
+  return (
+    <>
+      {/* Top Controls: Model Selector & Sub-Tab Pills (No changes needed) */}
+      <div className="mb-6 flex flex-col md:flex-row justify-between items-center gap-4">
+        {/* Sub-Tab Pills */}
+        <div className="flex space-x-1 p-1 bg-gray-200 rounded-lg">
+          {" "}
+          <button
+            aria-pressed={subTab === "categories"}
+            className={`px-4 py-1.5 text-sm font-medium rounded-md transition-colors duration-150 ${
+              subTab === "categories"
+                ? "bg-white shadow text-blue-600"
+                : "text-gray-600 hover:text-gray-800"
+            }`}
+            onClick={() => setSubTab("categories")}
+          >
+            {" "}
+            High-Level Categories{" "}
+          </button>{" "}
+          <button
+            aria-pressed={subTab === "metrics"}
+            className={`px-4 py-1.5 text-sm font-medium rounded-md transition-colors duration-150 ${
+              subTab === "metrics"
+                ? "bg-white shadow text-blue-600"
+                : "text-gray-600 hover:text-gray-800"
+            }`}
+            onClick={() => setSubTab("metrics")}
+          >
+            {" "}
+            Low-Level Metrics{" "}
+          </button>{" "}
+        </div>
+        {/* Model Selector */}
+        <div className="flex items-center flex-wrap gap-1">
+          {" "}
+          <span className="text-sm text-gray-500 mr-2">Models:</span>{" "}
+          {models?.map((model) => (
+            <button
+              key={model.model}
+              className={`px-2 py-0.5 text-xs rounded border ${
+                selectedModels.includes(model.model)
+                  ? "bg-sky-100 text-sky-800 border-sky-300 font-medium"
+                  : "bg-gray-100 text-gray-600 border-gray-300 hover:bg-gray-200"
+              }`}
+              onClick={() => {
+                if (selectedModels.includes(model.model)) {
+                  if (selectedModels.length > 1) {
+                    setSelectedModels(
+                      selectedModels.filter((m) => m !== model.model)
+                    );
+                  }
+                } else {
+                  setSelectedModels([...selectedModels, model.model]);
+                }
+              }}
+            >
+              {" "}
+              {model.model}{" "}
+            </button>
+          ))}{" "}
+        </div>
+      </div>
+      {/* Conditional content based on sub-tab */}
+      {subTab === "categories" && (
+        <div className="space-y-6">
+          {/* CATEGORIES VIEW */}
+          {/* Summary Table: Models as Rows, Categories as Columns - CORRECTED ACCESSORS */}
+          <div className="border rounded-lg overflow-hidden shadow-sm">
+            <div className="px-4 py-3 bg-gray-50 border-b">
+              <h3 className="font-semibold text-gray-800">
+                Category Performance Summary
+              </h3>
+            </div>
+            <div className="p-4 overflow-x-auto">
+              {sortedCategoryNames.length > 0 ? (
+                <table className="min-w-full divide-y divide-gray-200 border border-gray-200">
+                  <thead>
+                    <tr className="bg-gray-100">
+                      <th
+                        scope="col"
+                        className="sticky left-0 bg-gray-100 px-3 py-2 text-left text-xs font-semibold text-gray-600 uppercase tracking-wider z-10"
+                      >
+                        Model
+                      </th>
+                      {sortedCategoryNames.map((catName) => (
+                        <th
+                          key={catName}
+                          scope="col"
+                          className="px-3 py-2 text-left text-xs font-semibold text-gray-600 uppercase tracking-wider whitespace-nowrap"
+                        >
+                          {catName}
+                        </th>
+                      ))}
+                    </tr>
+                  </thead>
+                  <tbody className="bg-white divide-y divide-gray-200">
+                    {models
+                      ?.filter((m) => selectedModels.includes(m.model))
+                      .map((model, idx) => (
+                        <tr
+                          key={model.model}
+                          className={
+                            idx % 2 === 0
+                              ? "bg-white hover:bg-gray-50"
+                              : "bg-gray-50 hover:bg-gray-100"
+                          }
+                        >
+                          <td className="sticky left-0 bg-inherit px-3 py-2 whitespace-nowrap z-10 text-left">
+                            {" "}
+                            {/* Keep sticky styles */}
+                            <div className="flex items-center">
+                              <div
+                                className="w-3 h-3 rounded-full mr-2 shrink-0"
+                                style={{ backgroundColor: model.color }}
+                              ></div>
+                              <span className="text-sm font-medium">
+                                {model.model}
+                              </span>
+                            </div>
+                          </td>
+                          {sortedCategoryNames.map((catName) => {
+                            // Use correct camelCase keys
+                            const scoreData =
+                              highLevelCategories[catName]?.modelScores?.[
+                                model.model
+                              ];
+                            const score = scoreData?.nationalScore; // Access camelCase key
+                            const displayScore =
+                              score !== null && score !== undefined
+                                ? Number(score).toFixed(1)
+                                : "N/A";
+                            return (
+                              <td
+                                key={catName}
+                                className="px-3 py-2 whitespace-nowrap text-center"
+                              >
+                                <div
+                                  className={`text-sm ${
+                                    displayScore === "N/A"
+                                      ? "text-gray-400"
+                                      : getScoreColor(score)
+                                  }`}
+                                >
+                                  {displayScore}
+                                </div>
+                              </td>
+                            );
+                          })}
+                        </tr>
+                      ))}
+                  </tbody>
+                </table>
+              ) : (
+                <p className="text-center text-gray-500 py-4">
+                  No category data available.
+                </p>
+              )}
+            </div>
+          </div>
+          {/* Radar Chart for Categories (Uses filteredCategoryRadarData) */}
+          <div className="border rounded-lg overflow-hidden shadow-sm">
+            <div className="px-4 py-3 bg-gray-50 border-b flex justify-between items-center">
+              <h3 className="font-semibold text-gray-800">
+                Performance Across Categories
+              </h3>
+              <div className="text-xs text-gray-500">
+                National Average Scores
+              </div>
+            </div>
+            <div className="p-4">
+              {filteredCategoryRadarData &&
+              filteredCategoryRadarData.length > 0 ? (
+                <div className="h-96 md:h-[450px]">
+                  <ResponsiveContainer width="100%" height="100%">
+                    <RadarChart
+                      outerRadius="80%"
+                      data={filteredCategoryRadarData}
+                    >
+                      <PolarGrid gridType="polygon" stroke="#e5e7eb" />
+                      <PolarAngleAxis
+                        dataKey="category"
+                        tick={{ fill: "#4b5563", fontSize: 12 }}
+                      />
+                      <PolarRadiusAxis
+                        angle={90}
+                        domain={[0, 100]}
+                        axisLine={false}
+                        tick={{ fill: "#6b7280", fontSize: 10 }}
+                      />
+                      {models
+                        ?.filter((m) => selectedModels.includes(m.model))
+                        .map((model) => (
+                          <Radar
+                            key={model.model}
+                            name={model.model}
+                            dataKey={model.model}
+                            stroke={model.color}
+                            fill={model.color}
+                            fillOpacity={0.1}
+                            strokeWidth={2}
+                          />
+                        ))}
+                      {/* Use the corrected CustomRadarTooltip */}
+                      <RechartsTooltip content={<CustomRadarTooltip />} />
+                      <Legend
+                        iconSize={10}
+                        wrapperStyle={{ fontSize: "12px", paddingTop: "20px" }}
+                      />
+                    </RadarChart>
+                  </ResponsiveContainer>
+                </div>
+              ) : (
+                <p className="text-center text-gray-500 py-4">
+                  Radar data not available.
+                </p>
+              )}
+              <p className="text-xs text-gray-500 mt-4">
+                This radar chart visualizes how each model performs across
+                different high-level evaluation categories. The further out on
+                each axis, the better the performance on that category.
+              </p>
+            </div>
+          </div>
+        </div>
+      )}
+      {subTab === "metrics" && (
+        <div className="space-y-6">
+          {/* METRICS VIEW */}
+          {/* Radar Chart for Metrics (Uses metricRadarData) */}
+          <div className="border rounded-lg overflow-hidden shadow-sm">
+            <div className="px-4 py-3 bg-gray-50 border-b flex justify-between items-center">
+              <h3 className="font-semibold text-gray-800">
+                Performance Across All Metrics
+              </h3>
+              <div className="text-xs text-gray-500">
+                National Average Scores
+              </div>
+            </div>
+            <div className="p-4">
+              {metricRadarData.length > 0 ? (
+                <div className="h-96 md:h-[600px]">
+                  {" "}
+                  {/* Increased height */}
+                  <ResponsiveContainer width="100%" height="100%">
+                    <RadarChart outerRadius="80%" data={metricRadarData}>
+                      {" "}
+                      {/* Use metricRadarData */}
+                      <PolarGrid gridType="polygon" stroke="#e5e7eb" />
+                      <PolarAngleAxis
+                        dataKey="category"
+                        tick={{ fill: "#4b5563", fontSize: 10 }}
+                      />{" "}
+                      {/* Adjusted font size */}
+                      <PolarRadiusAxis
+                        angle={90}
+                        domain={[0, 100]}
+                        axisLine={false}
+                        tick={{ fill: "#6b7280", fontSize: 10 }}
+                      />
+                      {models
+                        ?.filter((m) => selectedModels.includes(m.model))
+                        .map((model) => (
+                          <Radar
+                            key={model.model}
+                            name={model.model}
+                            dataKey={model.model}
+                            stroke={model.color}
+                            fill={model.color}
+                            fillOpacity={0.1}
+                            strokeWidth={2}
+                          />
+                        ))}
+                      {/* Use the corrected CustomRadarTooltip */}
+                      <RechartsTooltip content={<CustomRadarTooltip />} />
+                      <Legend
+                        iconSize={10}
+                        wrapperStyle={{ fontSize: "12px", paddingTop: "20px" }}
+                      />
+                    </RadarChart>
+                  </ResponsiveContainer>
+                </div>
+              ) : (
+                <p className="text-center text-gray-500 py-4">
+                  Metric data not available for radar chart.
+                </p>
+              )}
+              <p className="text-xs text-gray-500 mt-4">
+                This radar chart visualizes how each model performs across
+                different low-level metrics. The further out on each axis, the
+                better the performance on that metric.
+              </p>
+            </div>
+          </div>
+          {/* Optional: Add a table summary for low-level metrics similar to the categories one if desired */}
+        </div>
+      )}
+    </>
+  );
+};
+export default MetricsBreakdown;

leaderboard-app/components/TaskPerformance.jsx ADDED Viewed

	@@ -0,0 +1,756 @@

+// components/TaskPerformance.jsx
+"use client";
+import React, { useState, useMemo, useEffect } from "react";
+import {
+  BarChart,
+  Bar,
+  XAxis,
+  YAxis,
+  CartesianGrid,
+  Tooltip as RechartsTooltip,
+  ResponsiveContainer,
+  Cell,
+} from "recharts";
+import {
+  getMetricTooltip,
+  getScoreBadgeColor,
+  formatDisplayKey,
+  camelToTitle,
+} from "../lib/utils"; // Import formatDisplayKey
+// Helper component for info tooltips
+const InfoTooltip = ({ text }) => {
+  /* ... (no change) ... */
+  const [isVisible, setIsVisible] = useState(false);
+  return (
+    <div className="relative inline-block ml-1 align-middle">
+      <button
+        className="text-gray-400 hover:text-gray-600 focus:outline-none"
+        onMouseEnter={() => setIsVisible(true)}
+        onMouseLeave={() => setIsVisible(false)}
+        onClick={(e) => {
+          e.stopPropagation();
+          setIsVisible(!isVisible);
+        }}
+        aria-label="Info"
+      >
+        <svg
+          xmlns="http://www.w3.org/2000/svg"
+          className="h-4 w-4"
+          viewBox="0 0 20 20"
+          fill="currentColor"
+        >
+          <path
+            fillRule="evenodd"
+            d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7-4a1 1 0 11-2 0 1 1 0 012 0zM9 9a1 1 0 000 2v3a1 1 0 001 1h1a1 1 0 100-2v-3a1 1 0 00-1-1H9z"
+            clipRule="evenodd"
+          />
+        </svg>{" "}
+      </button>{" "}
+      {isVisible && (
+        <div className="absolute z-10 w-64 p-2 bg-white border rounded shadow-lg text-xs text-gray-700 -translate-x-1/2 left-1/2 mt-1">
+          {text}
+        </div>
+      )}{" "}
+    </div>
+  );
+};
+// Custom tooltip for charts
+const CustomTooltip = ({ active, payload, label }) => {
+  /* ... (no change needed) ... */
+  if (active && payload && payload.length) {
+    const sortedPayload = [...payload].sort(
+      (a, b) => (b.value || 0) - (a.value || 0)
+    );
+    return (
+      <div className="bg-white p-3 border rounded shadow-lg max-w-xs">
+        <p className="font-medium text-sm">{label}</p>{" "}
+        {sortedPayload.map((entry, index) => (
+          <div key={`item-${index}`} className="flex items-center mt-1">
+            <div
+              className="w-3 h-3 mr-2 rounded-full flex-shrink-0"
+              style={{
+                backgroundColor:
+                  entry.payload?.color || entry.color || "#8884d8",
+              }}
+            ></div>{" "}
+            <span className="text-xs flex-grow pr-2">{entry.name}: </span>{" "}
+            <span className="text-xs font-medium ml-1 whitespace-nowrap">
+              {typeof entry.value === "number" ? entry.value.toFixed(1) : "N/A"}
+            </span>{" "}
+          </div>
+        ))}{" "}
+      </div>
+    );
+  }
+  return null;
+};
+// Tab component
+const TabButton = ({ active, onClick, children }) => (
+  <button
+    aria-pressed={active}
+    className={`px-4 py-1.5 text-sm font-medium rounded-md transition-colors duration-150 ${
+      active
+        ? "bg-white shadow text-blue-600"
+        : "text-gray-600 hover:text-gray-800"
+    }`}
+    onClick={onClick}
+  >
+    {children}{" "}
+  </button>
+);
+// Main component
+const TaskPerformance = ({
+  rawData,
+  modelsMeta,
+  metricsData, // Expects Title Case keys (e.g., Context Memory) containing internalMetricKey
+  overviewCardData,
+}) => {
+  const [activeTab, setActiveTab] = useState("top-performers");
+  // *** Use Title Case metric keys from processed metricsData ***
+  const highLevelMetricDisplayKeys = useMemo(
+    () => Object.keys(metricsData?.highLevelCategories || {}).sort(),
+    [metricsData?.highLevelCategories]
+  );
+  const lowLevelMetricDisplayKeys = useMemo(
+    () => Object.keys(metricsData?.lowLevelMetrics || {}).sort(),
+    [metricsData?.lowLevelMetrics]
+  );
+  // **************************************************************
+  // Access original snake_case keys from rawData
+  const { taskLevelPerformance = {}, tasks = [] } = rawData || {};
+  const { bestModelPerTask = {} } = overviewCardData || {};
+  const models = modelsMeta || [];
+  // State for 'Model Performance' tab
+  const [selectedTask, setSelectedTask] = useState(
+    tasks.length > 0 ? tasks[0] : "all"
+  );
+  const [selectedMetricType, setSelectedMetricType] = useState("high");
+  // *** selectedMetric now stores the Title Case display key ***
+  const [selectedMetricDisplayKey, setSelectedMetricDisplayKey] = useState("");
+  // ***********************************************************
+  const [selectedModels, setSelectedModels] = useState([]);
+  // Determine current metrics list (Title Case display keys)
+  const currentMetricDisplayKeysList = useMemo(
+    () =>
+      selectedMetricType === "high"
+        ? highLevelMetricDisplayKeys
+        : lowLevelMetricDisplayKeys,
+    [selectedMetricType, highLevelMetricDisplayKeys, lowLevelMetricDisplayKeys]
+  );
+  // Load models on mount
+  useEffect(() => {
+    if (models.length > 0 && selectedModels.length === 0) {
+      setSelectedModels(models.map((m) => m.model));
+    }
+  }, [models, selectedModels.length]);
+  // Set default metric display key when the list or type changes
+  useEffect(() => {
+    if (currentMetricDisplayKeysList.length > 0) {
+      if (
+        !selectedMetricDisplayKey ||
+        !currentMetricDisplayKeysList.includes(selectedMetricDisplayKey)
+      ) {
+        setSelectedMetricDisplayKey(currentMetricDisplayKeysList[0]); // Set to the first Title Case key
+      }
+    } else {
+      setSelectedMetricDisplayKey("");
+    }
+  }, [currentMetricDisplayKeysList, selectedMetricDisplayKey]);
+  // Prep chart data - *** UPDATED to use internalMetricKey looked up via selectedMetricDisplayKey ***
+  const chartData = useMemo(() => {
+    if (
+      !taskLevelPerformance ||
+      !selectedMetricDisplayKey ||
+      selectedModels.length === 0
+    )
+      return [];
+    // Find the internal snake_case key using the selected Title Case display name
+    const allMetricsProcessed = {
+      ...(metricsData?.highLevelCategories || {}),
+      ...(metricsData?.lowLevelMetrics || {}),
+    };
+    const metricInfo = allMetricsProcessed[selectedMetricDisplayKey]; // Look up using Title Case key
+    const internalMetricKey = metricInfo?.internalMetricKey; // Access the stored snake_case key
+    if (!internalMetricKey) {
+      console.warn(
+        `Could not find internal key for selected metric: ${selectedMetricDisplayKey}`
+      );
+      return [];
+    }
+    let data = [];
+    if (selectedTask === "all") {
+      const modelAggregates = {};
+      tasks.forEach((task) => {
+        if (taskLevelPerformance[task]) {
+          Object.entries(taskLevelPerformance[task]).forEach(
+            ([model, metrics]) => {
+              if (selectedModels.includes(model)) {
+                // *** Use the FOUND snake_case internalMetricKey ***
+                const score = metrics?.[internalMetricKey];
+                if (score !== undefined && score !== null && score !== "N/A") {
+                  const numScore = parseFloat(score);
+                  if (!isNaN(numScore)) {
+                    if (!modelAggregates[model])
+                      modelAggregates[model] = { sum: 0, count: 0 };
+                    modelAggregates[model].sum += numScore;
+                    modelAggregates[model].count++;
+                  }
+                }
+              }
+            }
+          );
+        }
+      });
+      data = Object.entries(modelAggregates).map(([model, aggregates]) => {
+        const modelMeta = models.find((m) => m.model === model) || {};
+        return {
+          model: model,
+          score:
+            aggregates.count > 0 ? aggregates.sum / aggregates.count : null,
+          color: modelMeta.color || "#999999",
+        };
+      });
+    } else if (taskLevelPerformance[selectedTask]) {
+      data = Object.entries(taskLevelPerformance[selectedTask])
+        .filter(([model, _metrics]) => selectedModels.includes(model))
+        .map(([model, metrics]) => {
+          // *** Use the FOUND snake_case internalMetricKey ***
+          const score = metrics?.[internalMetricKey];
+          const modelMeta = models.find((m) => m.model === model) || {};
+          return {
+            model: model,
+            score:
+              score !== undefined && score !== null && score !== "N/A"
+                ? parseFloat(score)
+                : null,
+            color: modelMeta.color || "#999999",
+          };
+        });
+    }
+    return data
+      .filter((item) => item.score !== null && !isNaN(item.score))
+      .sort((a, b) => b.score - a.score);
+    // Update dependencies
+  }, [
+    selectedTask,
+    selectedMetricDisplayKey,
+    selectedModels,
+    taskLevelPerformance,
+    models,
+    metricsData,
+    tasks,
+  ]);
+  // Task definitions
+  const featuredTasks = useMemo(
+    () => [
+      /* ... (keep task definitions array) ... */ {
+        id: "Generating a Creative Idea",
+        title: "Generating Creative Ideas",
+        description: "Brainstorming unique birthday gift ideas.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"
+            />
+          </svg>
+        ),
+      },
+      {
+        id: "Creating a Travel Itinerary",
+        title: "Creating Travel Itinerary",
+        description: "Planning a European city break.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M17.657 16.657L13.414 20.9a1.998 1.998 0 01-2.827 0l-4.244-4.243a8 8 0 1111.314 0z"
+            />
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M15 11a3 3 0 11-6 0 3 3 0 016 0z"
+            />
+          </svg>
+        ),
+      },
+      {
+        id: "Following Up on a Job Application",
+        title: "Following Up on Job App",
+        description: "Drafting a professional follow-up email.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M3 8l7.89 5.26a2 2 0 002.22 0L21 8M5 19h14a2 2 0 002-2V7a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"
+            />
+          </svg>
+        ),
+      },
+      {
+        id: "Planning Your Weekly Meals",
+        title: "Planning Weekly Meals",
+        description: "Creating a meal plan accommodating dietary restrictions.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"
+            />
+          </svg>
+        ),
+      },
+      {
+        id: "Making a Decision Between Options",
+        title: "Making a Decision",
+        description: "Comparing tech products for purchase.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+            strokeWidth={2}
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              d="M14 5l7 7m0 0l-7 7m7-7H3"
+            />{" "}
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              d="M10 19l-7-7m0 0l7-7m-7 7h17"
+            />
+          </svg>
+        ),
+      },
+      {
+        id: "Understanding a Complex Topic",
+        title: "Understanding a Complex Topic",
+        description: "Learning about day trading concepts.",
+        icon: (color) => (
+          <svg
+            style={{ color: color || "#6b7280" }}
+            className="h-8 w-8"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253"
+            />
+          </svg>
+        ),
+      },
+    ],
+    []
+  );
+  const tasksToDisplay = useMemo(() => {
+    const availableTaskKeys = bestModelPerTask
+      ? Object.keys(bestModelPerTask)
+      : [];
+    return featuredTasks.filter((ft) => availableTaskKeys.includes(ft.id));
+  }, [bestModelPerTask, featuredTasks]);
+  const taskRankings = useMemo(() => {
+    const rankings = {};
+    tasksToDisplay.forEach((task) => {
+      const taskId = task.id;
+      if (!taskLevelPerformance[taskId]) {
+        rankings[taskId] = [];
+        return;
+      }
+      const taskScores = models
+        .map((modelMeta) => {
+          const modelData = taskLevelPerformance[taskId][modelMeta.model];
+          if (!modelData) return null;
+          const scores = Object.values(modelData)
+            .map((s) => parseFloat(s))
+            .filter((s) => !isNaN(s));
+          if (scores.length === 0) return null;
+          const avgScore =
+            scores.reduce((sum, score) => sum + score, 0) / scores.length;
+          return {
+            model: modelMeta.model,
+            taskAvgScore: avgScore,
+            color: modelMeta.color || "#999999",
+          };
+        })
+        .filter((item) => item !== null)
+        .sort((a, b) => b.taskAvgScore - a.taskAvgScore);
+      rankings[taskId] = taskScores;
+    });
+    return rankings;
+  }, [tasksToDisplay, taskLevelPerformance, models]);
+  const renderTopPerformersTab = () => (
+    <div className="mb-6">
+      <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
+        {tasksToDisplay.length === 0 && (
+          <p className="col-span-full text-center text-gray-500 py-8">
+            No task performance data available.
+          </p>
+        )}
+        {tasksToDisplay.map((task) => {
+          const bestModelInfo = bestModelPerTask?.[task.id];
+          const topModelsForTask = taskRankings[task.id] || [];
+          if (!bestModelInfo || bestModelInfo.model === "N/A") return null;
+          const modelColor = bestModelInfo.color || "#6b7280";
+          return (
+            <div
+              key={task.id}
+              className="border rounded-lg overflow-hidden shadow-sm bg-white flex flex-col"
+            >
+              <div className="px-4 py-2 bg-gray-50 border-b flex items-center flex-shrink-0">
+                <h3
+                  className="font-semibold text-sm flex-grow truncate pr-2"
+                  title={task.title}
+                >
+                  {task.title}
+                </h3>
+                <div
+                  className="ml-1 w-2 h-2 rounded-full flex-shrink-0"
+                  style={{ backgroundColor: modelColor }}
+                  aria-hidden="true"
+                ></div>
+              </div>
+              <div className="p-4 flex-grow flex flex-col">
+                <div className="flex items-center mb-4 flex-shrink-0">
+                  <div
+                    className="p-2 rounded-full flex-shrink-0"
+                    style={{ backgroundColor: `${modelColor}20` }}
+                  >
+                    {task.icon(modelColor)}
+                  </div>
+                  <div className="ml-4 overflow-hidden">
+                    <h4
+                      className="text-lg font-semibold truncate"
+                      title={bestModelInfo.model}
+                    >
+                      {bestModelInfo.model}
+                    </h4>
+                    <p className="text-sm text-gray-600">
+                      Avg. Score: {bestModelInfo.score?.toFixed(1) ?? "N/A"}
+                    </p>
+                  </div>
+                </div>
+                <div className="mb-4 flex-grow">
+                  <h5 className="text-sm font-semibold mb-2">Task Ranking</h5>
+                  {topModelsForTask.length > 0 ? (
+                    <ol className="space-y-1.5 list-none pl-0">
+                      {topModelsForTask.map((rankedModel, index) => (
+                        <li
+                          key={rankedModel.model}
+                          className="text-sm flex items-center justify-between"
+                        >
+                          <div className="flex items-center truncate mr-2">
+                            <span className="font-medium w-4 mr-1.5 text-gray-500">
+                              {index + 1}.
+                            </span>
+                            <div
+                              className="w-2.5 h-2.5 rounded-full mr-1.5 flex-shrink-0"
+                              style={{ backgroundColor: rankedModel.color }}
+                            ></div>
+                            <span
+                              className="truncate"
+                              title={rankedModel.model}
+                            >
+                              {rankedModel.model}
+                            </span>
+                          </div>
+                          <span
+                            className={`font-medium flex-shrink-0 px-1.5 py-0.5 text-xs rounded ${getScoreBadgeColor(
+                              rankedModel.taskAvgScore
+                            )}`}
+                          >
+                            {rankedModel.taskAvgScore?.toFixed(1) ?? "N/A"}
+                          </span>
+                        </li>
+                      ))}
+                    </ol>
+                  ) : (
+                    <p className="text-xs text-gray-500 italic">
+                      Ranking data not available.
+                    </p>
+                  )}
+                </div>
+                <p className="text-xs text-gray-600 mt-auto pt-2 flex-shrink-0">
+                  Task Example: {task.description}
+                </p>
+              </div>
+            </div>
+          );
+        })}
+      </div>
+    </div>
+  );
+  // Render the model performance analysis tab - *** UPDATED SELECTOR & LABELS ***
+  const renderModelPerformanceTab = () => (
+    <div>
+      {/* Controls Panel */}
+      <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+        <div className="px-4 py-3 bg-gray-50 border-b">
+          <h3 className="font-semibold text-gray-800">
+            Task Analysis Controls
+          </h3>
+        </div>
+        <div className="p-4 flex flex-wrap items-center gap-4">
+          {/* Task Selector */}
+          <div className="w-full sm:w-auto">
+            <label
+              htmlFor="taskSelect"
+              className="block text-sm font-medium text-gray-700 mb-1"
+            >
+              Task
+            </label>
+            <select
+              id="taskSelect"
+              className="w-full sm:w-64 border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+              value={selectedTask}
+              onChange={(e) => setSelectedTask(e.target.value)}
+            >
+              <option value="all">All Tasks (Average)</option>
+              {tasks.sort().map((task) => (
+                <option key={task} value={task}>
+                  {task}
+                </option>
+              ))}
+            </select>
+          </div>
+          {/* Metric Type Selector Pills */}
+          <div className="flex flex-col">
+            <label className="block text-sm font-medium text-gray-700 mb-1">
+              Metric Type
+            </label>
+            <div className="flex space-x-1 p-1 bg-gray-200 rounded-lg">
+              <TabButton
+                active={selectedMetricType === "high"}
+                onClick={() => setSelectedMetricType("high")}
+              >
+                High-Level
+              </TabButton>
+              <TabButton
+                active={selectedMetricType === "low"}
+                onClick={() => setSelectedMetricType("low")}
+              >
+                Low-Level
+              </TabButton>
+            </div>
+          </div>
+          {/* Metric Selector - VALUE is Title Case key, displays Title Case */}
+          <div className="w-full sm:w-auto">
+            <label
+              htmlFor="metricSelect"
+              className="block text-sm font-medium text-gray-700 mb-1"
+            >
+              {selectedMetricType === "high"
+                ? "High-Level Metric"
+                : "Low-Level Metric"}
+            </label>
+            <select
+              id="metricSelect"
+              className="w-full sm:w-48 border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+              value={selectedMetricDisplayKey} // VALUE is the Title Case key
+              onChange={(e) => setSelectedMetricDisplayKey(e.target.value)} // Store Title Case key
+              disabled={currentMetricDisplayKeysList.length === 0}
+            >
+              {currentMetricDisplayKeysList.length === 0 && (
+                <option value="">No metrics</option>
+              )}
+              {/* Iterate through Title Case keys, display Title Case */}
+              {currentMetricDisplayKeysList.map((displayKey) => (
+                <option key={displayKey} value={displayKey}>
+                  {displayKey}
+                </option>
+              ))}
+            </select>
+          </div>
+        </div>
+      </div>
+      {/* Chart Visualization */}
+      <div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
+        {/* Use selectedMetricDisplayKey for title */}
+        <div className="px-4 py-3 bg-gray-50 border-b">
+          <h3 className="font-semibold text-gray-800">
+            {`${selectedMetricDisplayKey || "Selected Metric"} Comparison for `}
+            <span className="font-normal">
+              {selectedTask === "all"
+                ? "All Tasks (Average)"
+                : `"${selectedTask}"`}
+            </span>
+          </h3>
+        </div>
+        <div className="p-4">
+          {chartData.length > 0 ? (
+            <div className="h-80">
+              <ResponsiveContainer width="100%" height="100%">
+                <BarChart
+                  data={chartData}
+                  margin={{ top: 5, right: 5, left: 0, bottom: 5 }}
+                  barCategoryGap="20%"
+                >
+                  <CartesianGrid strokeDasharray="3 3" vertical={false} />
+                  <XAxis dataKey="model" hide />
+                  <YAxis domain={[0, 100]} width={30} tick={{ fontSize: 11 }} />
+                  <RechartsTooltip
+                    content={<CustomTooltip />}
+                    wrapperStyle={{ zIndex: 10 }}
+                  />
+                  {/* Use Title Case key for Bar name */}
+                  <Bar
+                    dataKey="score"
+                    name={selectedMetricDisplayKey || "Score"}
+                    radius={[4, 4, 0, 0]}
+                  >
+                    {chartData.map((entry, index) => (
+                      <Cell key={`cell-${index}`} fill={entry.color} />
+                    ))}
+                  </Bar>
+                </BarChart>
+              </ResponsiveContainer>
+              <div className="flex flex-wrap justify-center gap-x-4 gap-y-1 mt-4 text-xs">
+                {chartData.map((entry) => (
+                  <div key={entry.model} className="flex items-center">
+                    <div
+                      className="w-2.5 h-2.5 rounded-full mr-1.5"
+                      style={{ backgroundColor: entry.color }}
+                    ></div>
+                    <span>{entry.model}</span>
+                  </div>
+                ))}
+              </div>
+            </div>
+          ) : (
+            <div className="flex items-center justify-center h-60 bg-gray-50 rounded">
+              <div className="text-center p-4">
+                <svg
+                  xmlns="http://www.w3.org/2000/svg"
+                  className="h-10 w-10 mx-auto text-gray-400 mb-3"
+                  fill="none"
+                  viewBox="0 0 24 24"
+                  stroke="currentColor"
+                >
+                  <path
+                    strokeLinecap="round"
+                    strokeLinejoin="round"
+                    strokeWidth={2}
+                    d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V7a2 2 0 012-2h2l2-3h6l2 3h2a2 2 0 012 2v10a2 2 0 01-2 2h-1"
+                  />
+                </svg>
+                <h3 className="text-lg font-medium text-gray-900 mb-1">
+                  No Data Available
+                </h3>
+                <p className="text-sm text-gray-600">
+                  No data available for the selected task, metric, and models.
+                </p>
+              </div>
+            </div>
+          )}
+          <div className="mt-15 text-xs text-gray-500">
+            {/* Corrected margin-top */}
+            {/* Use Title Case key for display and lookup */}
+            <p>
+              This chart shows{" "}
+              <strong>
+                {selectedMetricDisplayKey || "the selected metric"}
+              </strong>{" "}
+              scores (0-100, higher is better) for models on
+              {selectedTask === "all"
+                ? "average across all tasks"
+                : `the "${selectedTask}" task`}
+              .
+              {selectedMetricDisplayKey &&
+                ` Metric definition: ${getMetricTooltip(
+                  selectedMetricDisplayKey
+                )}`}
+            </p>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+  // Main return with tabs
+  return (
+    <div>
+      <div className="mb-6 flex flex-col md:flex-row justify-between items-center gap-4">
+        <div className="flex space-x-1 p-1 bg-gray-200 rounded-lg">
+          <TabButton
+            active={activeTab === "top-performers"}
+            onClick={() => setActiveTab("top-performers")}
+          >
+            Top Performing Models by Task
+          </TabButton>{" "}
+          <TabButton
+            active={activeTab === "model-performance"}
+            onClick={() => setActiveTab("model-performance")}
+          >
+            Model Performance Comparison
+          </TabButton>{" "}
+        </div>{" "}
+      </div>
+      {activeTab === "top-performers"
+        ? renderTopPerformersTab()
+        : renderModelPerformanceTab()}
+    </div>
+  );
+};
+export default TaskPerformance;

leaderboard-app/components/Tooltip.jsx ADDED Viewed

	@@ -0,0 +1,145 @@

+"use client";
+import React, { useState, useRef, useEffect } from "react";
+export const Tooltip = ({
+  content,
+  children,
+  position = "top",
+  showIcon = true,
+  iconClassName = "",
+}) => {
+  const [isVisible, setIsVisible] = useState(false);
+  const [tooltipStyle, setTooltipStyle] = useState({});
+  const tooltipRef = useRef(null);
+  const iconRef = useRef(null);
+  const showTooltip = () => setIsVisible(true);
+  const hideTooltip = () => setIsVisible(false);
+  // Position the tooltip when it becomes visible
+  useEffect(() => {
+    if (isVisible && iconRef.current && tooltipRef.current) {
+      const triggerRect = iconRef.current.getBoundingClientRect();
+      const tooltipRect = tooltipRef.current.getBoundingClientRect();
+      const spacing = 8; // Space between trigger and tooltip
+      let style = {};
+      switch (position) {
+        case "top":
+          style = {
+            left:
+              triggerRect.left + triggerRect.width / 2 - tooltipRect.width / 2,
+            top: triggerRect.top - tooltipRect.height - spacing,
+          };
+          break;
+        case "bottom":
+          style = {
+            left:
+              triggerRect.left + triggerRect.width / 2 - tooltipRect.width / 2,
+            top: triggerRect.bottom + spacing,
+          };
+          break;
+        case "left":
+          style = {
+            left: triggerRect.left - tooltipRect.width - spacing,
+            top:
+              triggerRect.top + triggerRect.height / 2 - tooltipRect.height / 2,
+          };
+          break;
+        case "right":
+          style = {
+            left: triggerRect.right + spacing,
+            top:
+              triggerRect.top + triggerRect.height / 2 - tooltipRect.height / 2,
+          };
+          break;
+      }
+      // Adjust if tooltip would go off-screen
+      const viewportWidth = window.innerWidth;
+      const viewportHeight = window.innerHeight;
+      if (style.left < 10) style.left = 10;
+      if (style.left + tooltipRect.width > viewportWidth - 10) {
+        style.left = viewportWidth - tooltipRect.width - 10;
+      }
+      if (style.top < 10) style.top = 10;
+      if (style.top + tooltipRect.height > viewportHeight - 10) {
+        style.top = viewportHeight - tooltipRect.height - 10;
+      }
+      // Convert to fixed position
+      style.position = "fixed";
+      style.left = `${style.left}px`;
+      style.top = `${style.top}px`;
+      setTooltipStyle(style);
+    }
+  }, [isVisible, position]);
+  return (
+    <div className="inline-flex items-center relative">
+      {children}
+      {showIcon && (
+        <div
+          ref={iconRef}
+          className={`inline-flex items-center justify-center ml-1 cursor-help ${iconClassName}`}
+          onMouseEnter={showTooltip}
+          onMouseLeave={hideTooltip}
+        >
+          <svg
+            xmlns="http://www.w3.org/2000/svg"
+            className="h-4 w-4 text-gray-400 hover:text-gray-500"
+            fill="none"
+            viewBox="0 0 24 24"
+            stroke="currentColor"
+          >
+            <path
+              strokeLinecap="round"
+              strokeLinejoin="round"
+              strokeWidth={2}
+              d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"
+            />
+          </svg>
+        </div>
+      )}
+      {isVisible && (
+        <div
+          ref={tooltipRef}
+          className="z-50 bg-gray-800 text-white text-xs rounded py-1 px-2 max-w-xs shadow-lg pointer-events-none"
+          style={{
+            ...tooltipStyle,
+          }}
+        >
+          {content}
+          <div
+            className={`absolute w-2 h-2 bg-gray-800 transform rotate-45 ${
+              position === "top"
+                ? "bottom-0 translate-y-1/2"
+                : position === "bottom"
+                ? "top-0 -translate-y-1/2"
+                : position === "left"
+                ? "right-0 translate-x-1/2"
+                : "left-0 -translate-x-1/2"
+            }`}
+            style={{
+              left:
+                position === "top" || position === "bottom"
+                  ? "calc(50% - 4px)"
+                  : "",
+              top:
+                position === "left" || position === "right"
+                  ? "calc(50% - 4px)"
+                  : "",
+            }}
+          />
+        </div>
+      )}
+    </div>
+  );
+};

leaderboard-app/eslint.config.mjs ADDED Viewed

	@@ -0,0 +1,14 @@

+import { dirname } from "path";
+import { fileURLToPath } from "url";
+import { FlatCompat } from "@eslint/eslintrc";
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = dirname(__filename);
+const compat = new FlatCompat({
+  baseDirectory: __dirname,
+});
+const eslintConfig = [...compat.extends("next/core-web-vitals")];
+export default eslintConfig;

leaderboard-app/jsconfig.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "compilerOptions": {
+    "paths": {
+      "@/*": ["./*"]
+    }
+  }
+}

leaderboard-app/lib/utils.js ADDED Viewed

	@@ -0,0 +1,708 @@

+// lib/utils.js
+/**
+ * Constants
+ */
+const MODEL_COLORS = {
+  "gpt-4o": "#0072B2", // Strong blue
+  "claude-3.7-sonnet": "#D55E00", // Vermillion/orange-red
+  "deepseek-r1": "#F0E442", // Yellow
+  o1: "#CC79A7", // Pink
+  "gemini-2.0-flash-001": "#009E73", // Bluish green
+  "llama-3.1-405b-instruct": "#56B4E9", // Light blue
+};
+// --- Helper Functions ---
+/**
+ * Converts camelCase to Title Case.
+ * @param {string} str Input string.
+ * @returns {string} Title Case string.
+ */
+export const camelToTitle = (str) => {
+  if (!str) return str;
+  const spaced = str.replace(/([A-Z])/g, " $1");
+  return spaced.charAt(0).toUpperCase() + spaced.slice(1).trim();
+};
+/**
+ * Helper to format metric/factor names (snake/kebab to Title Case)
+ * Needed for display consistency when keys are snake_case.
+ */
+export const formatDisplayKey = (key) => {
+  if (!key || typeof key !== "string") return "N/A";
+  if (key === "N/A") return "N/A";
+  // Handle snake_case or kebab-case input
+  return key
+    .replace(/_/g, " ")
+    .replace(/-/g, " ")
+    .trim()
+    .replace(/\b\w/g, (l) => l.toUpperCase());
+};
+/**
+ * Helper to get Significance indicator style and tooltip
+ */
+export function getSignificanceIndicator(isSignificant, pValue, alpha = 0.05) {
+  const pValueFormatted =
+    typeof pValue === "number" && !isNaN(pValue) ? pValue.toFixed(3) : "N/A";
+  if (isSignificant === true) {
+    return {
+      symbol: "✓",
+      className: "text-green-600",
+      tooltip: `Statistically Significant (p=${pValueFormatted} < ${alpha})`,
+    };
+  } else if (isSignificant === false) {
+    return {
+      symbol: "✗",
+      className: "text-red-600",
+      tooltip: `Not Statistically Significant (p=${pValueFormatted} ≥ ${alpha})`,
+    };
+  } else {
+    return {
+      symbol: "?",
+      className: "text-gray-400",
+      tooltip: "Significance Undetermined",
+    };
+  }
+}
+/**
+ * Determines the style and tooltip for an equity gap status indicator.
+ */
+export function getEquityIndicatorStyle(
+  isConcern,
+  isLargeEffect,
+  isSignificant,
+  pValue,
+  effectSizeClass
+) {
+  const pValueText =
+    typeof pValue === "number" && !isNaN(pValue)
+      ? `p=${pValue.toFixed(3)}`
+      : "p=N/A";
+  const effectText = `Effect: ${effectSizeClass || "N/A"}`;
+  if (isConcern === true) {
+    return {
+      icon: "▲",
+      colorClass: "text-red-600",
+      tooltip: `Equity Concern (${effectText}, Significant, ${pValueText})`,
+    };
+  } else if (isSignificant === null) {
+    return {
+      icon: "?",
+      colorClass: "text-gray-500",
+      tooltip: `Significance Undetermined (${effectText})`,
+    };
+  } else if (isLargeEffect === true && isSignificant === false) {
+    return {
+      icon: "●",
+      colorClass: "text-yellow-600",
+      tooltip: `Large Effect but Not Statistically Significant (${pValueText})`,
+    };
+  } else if (isSignificant === true) {
+    return {
+      icon: "✓",
+      colorClass: "text-green-600",
+      tooltip: `Statistically Significant but Not Large Effect (${effectText}, ${pValueText})`,
+    };
+  } else {
+    return {
+      icon: "✓",
+      colorClass: "text-gray-400",
+      tooltip: `Not Statistically Significant (${effectText}, ${pValueText})`,
+    };
+  }
+}
+/**
+ * Determine styling based on score for generic BADGES (background + text)
+ */
+export function getScoreBadgeColor(score, min = 0, max = 100) {
+  const numericScore = Number(score);
+  if (
+    score === null ||
+    score === undefined ||
+    score === "N/A" ||
+    isNaN(numericScore)
+  ) {
+    return "bg-gray-100 text-gray-800";
+  }
+  const range = Math.abs(max - min);
+  if (range <= 0) return "bg-gray-100 text-gray-800";
+  let percent;
+  if (max > min) {
+    percent = ((numericScore - min) / range) * 100;
+  } else {
+    percent = ((min - numericScore) / range) * 100;
+  }
+  if (percent >= 80) return "bg-green-100 text-green-800";
+  if (percent >= 50) return "bg-blue-100 text-blue-800";
+  if (percent >= 20) return "bg-yellow-100 text-yellow-800";
+  return "bg-red-100 text-red-800";
+}
+/**
+ * Determine TEXT color based on score (0-100 scale, higher is better)
+ */
+export function getScoreColor(score) {
+  const numericScore = Number(score);
+  if (
+    score === null ||
+    score === undefined ||
+    score === "N/A" ||
+    isNaN(numericScore)
+  ) {
+    return "text-gray-400";
+  }
+  if (numericScore >= 80) return "text-green-600 font-medium";
+  if (numericScore >= 60) return "text-blue-600";
+  if (numericScore >= 40) return "text-yellow-600";
+  return "text-red-600";
+}
+/**
+ * Tooltip text for metrics and table headers - Accepts original keys
+ */
+export const getMetricTooltip = (key) => {
+  // Format the key for display/lookup in tooltips map if needed
+  const titleCaseKey = formatDisplayKey(key); // Convert snake_case/camelCase to Title Case
+  const tooltips = {
+    // Use Title Case keys matching dropdowns/headers
+    // High-level
+    Helpfulness:
+      "How well the model provides useful assistance that addresses user needs",
+    Communication:
+      "Quality of clarity, coherence, and appropriateness of writing style",
+    Understanding:
+      "How well the model comprehends requests and contextual information",
+    Adaptiveness:
+      "How well the model adjusts to user needs and feedback during conversation",
+    Trustworthiness:
+      "Transparency, accuracy, and consistency in model responses",
+    Personality:
+      "Consistency and definition of the model's persona and ethical alignment",
+    "Background And Culture":
+      "Cultural sensitivity, relevance, and freedom from bias",
+    "Repeat Usage":
+      "User satisfaction and willingness to use the model again (score 0-100).",
+    // Low-level (use formatted names matching display)
+    Effectiveness: "How effectively the model helps accomplish specific goals",
+    Comprehensiveness:
+      "How thoroughly the model addresses all aspects of requests",
+    Usefulness: "Practicality and relevance of suggestions or solutions",
+    "Tone And Language Style":
+      "Appropriateness of tone and language for the context",
+    "Conversation Flow": "Natural and conversational quality of responses",
+    "Detail And Technical Language":
+      "Appropriate level of detail and technical language",
+    Accuracy: "Accuracy in interpreting user requests",
+    "Context Memory": "Ability to maintain conversation context",
+    Intuitiveness: "Ability to pick up on implicit aspects of requests",
+    Flexibility: "Adapting responses based on user feedback",
+    Clarity: "Ability to clarify ambiguities or misunderstandings",
+    "Conversation Building": "Building upon previous exchanges in conversation",
+    Consistency: "Consistency of responses across similar questions",
+    Confidence: "User confidence in accuracy of information",
+    Transparency: "Openness about limitations or uncertainties",
+    "Personality Consistency":
+      "Consistency of personality throughout interactions",
+    "Distinct Personality": "How well-defined the model's personality is",
+    "Honesty Empathy Fairness": "Alignment with ethical expectations",
+    "Ethical Alignment": "Alignment with user culture, viewpoint, or values",
+    "Cultural Awareness":
+      "Recognition of when cultural perspective is relevant",
+    "Bias And Stereotypes": "Freedom from stereotypes and bias in responses",
+    // Table headers
+    "Overall Score":
+      "Average score across high-level categories (0-100). Higher is better.",
+    "Overall SD":
+      "Standard Deviation (± points) of scores across high-level categories. Lower indicates more consistent performance across capabilities.",
+    "Max Equity Gap":
+      "Score difference (points) for the demographic gap with the largest statistical effect size for this model. Status icon indicates Equity Concern (▲) and/or Significance (✓/✗/?). Hover for details.",
+    "Max Gap Area":
+      "The specific Demographic Factor and Category where the 'Max Equity Gap' (largest effect size gap) occurred for this model.",
+    "Equity Concerns (%)":
+      "Percentage of evaluated demographic gaps flagged as Equity Concerns (Large Effect & Statistically Significant, p<0.05). Lower is better.",
+    "User Retention":
+      "Model score for the 'Repeat Usage' category (0-100), indicating likelihood of users using the model again.",
+  };
+  // Try lookup with formatted key, then original key as fallback
+  return tooltips[titleCaseKey] || tooltips[key] || "No description available";
+};
+/**
+ * Badge color based on Effect Size Class
+ */
+export function getEffectSizeBadgeColor(effectSizeClass) {
+  if (!effectSizeClass || effectSizeClass === "N/A") {
+    return "bg-gray-100 text-gray-800";
+  }
+  switch (effectSizeClass) {
+    case "Negligible":
+      return "bg-green-100 text-green-800";
+    case "Small":
+      return "bg-blue-100 text-blue-800";
+    case "Medium":
+      return "bg-yellow-100 text-yellow-800";
+    case "Large":
+      return "bg-red-100 text-red-800";
+    default:
+      return "bg-gray-100 text-gray-800";
+  }
+}
+/**
+ * Helper function to process task performance data
+ * Expects rawData input with snake_case keys
+ */
+function processTaskPerformance(rawData, taskCategoryMap, modelOrder) {
+  const result = {
+    bestModelPerTask: {},
+    keyMetricsByTask: {},
+    bestModelPerTaskCategory: {
+      creative: null,
+      practical: null,
+      analytical: null,
+    },
+    keyMetricsByTaskCategory: { creative: [], practical: [], analytical: [] },
+  };
+  // Access original snake_case key from input
+  const taskPerformance = rawData?.task_level_performance;
+  if (!taskPerformance || typeof taskPerformance !== "object") {
+    console.warn(
+      "Task level performance data missing or invalid in processTaskPerformance input."
+    );
+    return result;
+  }
+  // Task names are keys in taskPerformance
+  Object.keys(taskPerformance).forEach((taskName) => {
+    const taskData = taskPerformance[taskName];
+    if (!taskData) return;
+    let taskBestModel = null;
+    let taskBestAvgScore = -Infinity;
+    let taskBestModelMetrics = null;
+    modelOrder.forEach((modelName) => {
+      // Iterate through known models
+      const modelMetrics = taskData[modelName];
+      if (modelMetrics && typeof modelMetrics === "object") {
+        // Access metric scores using original snake_case keys within modelMetrics
+        const scores = Object.values(modelMetrics)
+          .map((s) => Number(s))
+          .filter((s) => !isNaN(s));
+        if (scores.length > 0) {
+          const avgScore =
+            scores.reduce((sum, score) => sum + score, 0) / scores.length;
+          if (avgScore > taskBestAvgScore) {
+            taskBestAvgScore = avgScore;
+            taskBestModel = modelName;
+            taskBestModelMetrics = modelMetrics;
+          }
+        }
+      }
+    });
+    if (taskBestModel && taskBestModelMetrics) {
+      result.bestModelPerTask[taskName] = {
+        model: taskBestModel,
+        score: taskBestAvgScore,
+        color: MODEL_COLORS[taskBestModel] || "#999999",
+      };
+      // Extract top metrics (keys are snake_case)
+      const metricsArray = Object.entries(taskBestModelMetrics)
+        .map(([metricKey, score]) => ({ metricKey, score: Number(score) || 0 }))
+        .sort((a, b) => b.score - a.score);
+      // Store with snake_case key, add display name
+      result.keyMetricsByTask[taskName] = metricsArray
+        .slice(0, 3)
+        .map((m) => ({ ...m, metricName: formatDisplayKey(m.metricKey) }));
+    } else {
+      result.bestModelPerTask[taskName] = {
+        model: "N/A",
+        score: "N/A",
+        color: "#999999",
+      };
+      result.keyMetricsByTask[taskName] = [];
+    }
+  });
+  // Task Categories processing
+  const tasksByCategory = { creative: [], practical: [], analytical: [] };
+  Object.entries(taskCategoryMap).forEach(([task, category]) => {
+    if (tasksByCategory[category] && taskPerformance[task]) {
+      tasksByCategory[category].push(task);
+    }
+  });
+  Object.entries(tasksByCategory).forEach(([category, tasks]) => {
+    const categoryNameDisplay = `${
+      category.charAt(0).toUpperCase() + category.slice(1)
+    } Tasks`;
+    if (tasks.length === 0) {
+      result.bestModelPerTaskCategory[category] = {
+        model: "N/A",
+        score: "N/A",
+        color: "#999999",
+        categoryName: categoryNameDisplay,
+      };
+      result.keyMetricsByTaskCategory[category] = [];
+      return;
+    }
+    const categoryModelScores = {};
+    modelOrder.forEach((modelName) => {
+      categoryModelScores[modelName] = { totalScore: 0, count: 0, metrics: {} };
+      tasks.forEach((task) => {
+        if (taskPerformance[task]?.[modelName]) {
+          // metricKey is original snake_case here
+          Object.entries(taskPerformance[task][modelName]).forEach(
+            ([metricKey, score]) => {
+              const numScore = Number(score);
+              if (!isNaN(numScore)) {
+                categoryModelScores[modelName].totalScore += numScore;
+                categoryModelScores[modelName].count++;
+                if (!categoryModelScores[modelName].metrics[metricKey])
+                  categoryModelScores[modelName].metrics[metricKey] = {
+                    sum: 0,
+                    count: 0,
+                  };
+                categoryModelScores[modelName].metrics[metricKey].sum +=
+                  numScore;
+                categoryModelScores[modelName].metrics[metricKey].count++;
+              }
+            }
+          );
+        }
+      });
+    });
+    let bestAvg = -Infinity;
+    let bestCatModel = null;
+    Object.entries(categoryModelScores).forEach(([model, data]) => {
+      if (data.count > 0) {
+        const avg = data.totalScore / data.count;
+        if (avg > bestAvg) {
+          bestAvg = avg;
+          bestCatModel = model;
+        }
+      }
+    });
+    if (bestCatModel) {
+      result.bestModelPerTaskCategory[category] = {
+        model: bestCatModel,
+        score: Number(bestAvg.toFixed(1)),
+        color: MODEL_COLORS[bestCatModel] || "#999999",
+        categoryName: categoryNameDisplay,
+      };
+      const bestModelMetricsData =
+        categoryModelScores[bestCatModel]?.metrics || {};
+      // metricKey is snake_case
+      const metricAverages = Object.entries(bestModelMetricsData)
+        .map(([metricKey, data]) => ({
+          metricKey,
+          score: data.count > 0 ? data.sum / data.count : 0,
+        }))
+        .sort((a, b) => b.score - a.score);
+      // Store with original key, add display name
+      result.keyMetricsByTaskCategory[category] = metricAverages
+        .slice(0, 5)
+        .map((m) => ({
+          metric: formatDisplayKey(m.metricKey),
+          score: m.score,
+          scoreDisplay: m.score.toFixed(1),
+        }));
+    } else {
+      result.bestModelPerTaskCategory[category] = {
+        model: "N/A",
+        score: "N/A",
+        color: "#999999",
+        categoryName: categoryNameDisplay,
+      };
+      result.keyMetricsByTaskCategory[category] = [];
+    }
+  });
+  return result; // Returns object with camelCase keys
+}
+/**
+ * Prepares the data from leaderboard_data.json for visualization
+ * FINAL v4: Reverted deep camelCase conversion. Processes top-level keys and adds equity concern %.
+ * Keeps nested raw data keys as original (snake_case).
+ * @param {Object} rawDataInput - The raw data from leaderboard_data.json (expected snake_case)
+ * @returns {Object} - Processed data ready for visualization
+ */
+export function prepareDataForVisualization(rawDataInput) {
+  // Basic Validation
+  const defaultReturn = {
+    models: [],
+    metricsData: { highLevelCategories: {}, lowLevelMetrics: {} },
+    radarData: [],
+    bestPerCategory: {},
+    bestPerMetric: {},
+    overviewCardData: {},
+    rawData: {},
+    metadata: {},
+    equityAnalysis: {},
+  };
+  if (
+    !rawDataInput ||
+    !rawDataInput.model_order ||
+    !Array.isArray(rawDataInput.model_order)
+  ) {
+    console.error(
+      "prepareDataForVisualization received invalid rawData.",
+      rawDataInput
+    );
+    return defaultReturn;
+  }
+  // Keep original references where structure is maintained
+  const modelOrder = rawDataInput.model_order;
+  const equityAnalysis = rawDataInput.equity_analysis || {
+    all_equity_gaps: [],
+    model_max_effect_gaps: {},
+    universal_issues: [],
+    assessment_method: {},
+    demographic_variation_stats: {},
+  };
+  const allGaps = equityAnalysis.all_equity_gaps || [];
+  const metadata = rawDataInput.metadata || {};
+  const mrpDemographicsRaw = rawDataInput.mrp_demographics || {};
+  const taskLevelPerformanceRaw = rawDataInput.task_level_performance || {};
+  // Process MRP Demographics for filtering options
+  const demographicFactors = new Set();
+  const demographicLevels = {};
+  const availableMetrics = new Set();
+  if (mrpDemographicsRaw && typeof mrpDemographicsRaw === "object") {
+    Object.values(mrpDemographicsRaw).forEach((modelData) => {
+      Object.entries(modelData || {}).forEach(([factor, factorData]) => {
+        demographicFactors.add(factor);
+        if (!demographicLevels[factor]) demographicLevels[factor] = new Set();
+        Object.entries(factorData || {}).forEach(([level, levelData]) => {
+          demographicLevels[factor].add(level);
+          Object.keys(levelData || {}).forEach((metric) =>
+            availableMetrics.add(metric)
+          );
+        });
+      });
+    }); // metric is Title Case here from Python processing
+  }
+  const demographicOptions = {};
+  demographicFactors.forEach((factor) => {
+    demographicOptions[factor] = Array.from(
+      demographicLevels[factor] || new Set()
+    ).sort();
+  });
+  const availableMetricsList = Array.from(availableMetrics).sort(); // These are Title Case
+  // Process Overall Rankings -> camelCase & add equity concern %
+  const overallRankingProcessed = (rawDataInput.overall_ranking || []).map(
+    (modelData) => {
+      const modelName = modelData.model;
+      // details object keys are snake_case from python
+      const maxEffectGapDetails = modelData.max_effect_gap_details || {};
+      const safeParseFloat = (val) => {
+        const num = Number(val);
+        return isNaN(num) ? null : num;
+      };
+      const modelSpecificGaps = allGaps.filter(
+        (gap) => gap.model === modelName
+      ); // Access snake_case keys in allGaps
+      const totalGapsForModel = modelSpecificGaps.length;
+      const concernCountForModel = modelSpecificGaps.filter(
+        (gap) => gap.is_equity_concern === true
+      ).length;
+      let equityConcernPercentage = null;
+      if (totalGapsForModel > 0) {
+        equityConcernPercentage =
+          (concernCountForModel / totalGapsForModel) * 100;
+      }
+      // Return structure with camelCase keys
+      return {
+        rank: modelData.rank,
+        model: modelName,
+        overallScore: safeParseFloat(modelData.overall_score),
+        highLevelCatScore: safeParseFloat(modelData.high_level_cat_score),
+        lowLevelCatScore: safeParseFloat(modelData.low_level_cat_score),
+        color: MODEL_COLORS[modelName] || "#999999",
+        // Use snake_case keys from input JSON for these fields
+        stdDevAcrossCats: modelData.std_dev_across_cats,
+        stdDevAcrossCatsNumeric: safeParseFloat(modelData.std_dev_across_cats),
+        repeatUsageScore: safeParseFloat(modelData.repeat_usage_score),
+        maxEffectCategory: modelData.max_effect_category, // snake_case from input
+        maxEffectFactor: maxEffectGapDetails.demographic_factor, // snake_case from input
+        maxEffectSize: safeParseFloat(maxEffectGapDetails.effect_size),
+        maxEffectGap: safeParseFloat(maxEffectGapDetails.score_range),
+        maxEffectConcernFlag: maxEffectGapDetails.is_equity_concern ?? false,
+        maxEffectSignificant: maxEffectGapDetails.is_statistically_significant,
+        maxEffectPValue: maxEffectGapDetails.p_value,
+        maxEffectSizeClass: maxEffectGapDetails.effect_size_class || "N/A",
+        maxEffectRawNHeuristic:
+          maxEffectGapDetails.raw_n_confidence_heuristic || "N/A",
+        maxEffectGapDetails: maxEffectGapDetails, // Pass original snake_case details
+        equityConcernPercentage: equityConcernPercentage,
+      };
+    }
+  );
+  // Process Metrics Breakdown -> camelCase keys for structure, keep original metric keys inside
+  const metricsBreakdownProcessed = {
+    highLevelCategories: {},
+    lowLevelMetrics: {},
+  };
+  if (
+    rawDataInput.metrics_breakdown &&
+    typeof rawDataInput.metrics_breakdown === "object"
+  ) {
+    const processCategory = (displayKey, categoryData) => {
+      // Input displayKey is Title Case from python output
+      if (!categoryData || !categoryData.model_scores) {
+        console.warn(`Missing model_scores for category: ${displayKey}`);
+        return {
+          modelScores: {},
+          topPerformer: { model: "N/A", score: null, color: "#999999" },
+        };
+      }
+      const internalMetricKey = categoryData._internal_category_name; // Get original snake_case key
+      const processedModelScores = {};
+      modelOrder.forEach((modelName) => {
+        const scores = categoryData.model_scores[modelName]; // Access model scores
+        if (!scores) {
+          processedModelScores[modelName] = {
+            nationalScore: null,
+            color: MODEL_COLORS[modelName] || "#999999",
+            maxEffectGapInfo: {},
+          };
+          return;
+        }
+        const maxEffectGapInfoForCat = scores.max_effect_gap_info || {}; // snake_case keys inside? Check python output. Assume yes.
+        processedModelScores[modelName] = {
+          nationalScore: scores.national_score ?? null,
+          color: MODEL_COLORS[modelName] || "#999999",
+          // Keep original snake_case keys for gap info within this structure
+          maxEffectGapInfo: maxEffectGapInfoForCat,
+        };
+      });
+      const topPerf = categoryData.top_performer || {};
+      const topPerfScore =
+        topPerf.score === "N/A" || topPerf.score === null
+          ? null
+          : Number(topPerf.score);
+      return {
+        modelScores: processedModelScores, // Nested scores
+        topPerformer: {
+          model: topPerf.model || "N/A",
+          score: isNaN(topPerfScore) ? null : topPerfScore,
+          color: MODEL_COLORS[topPerf.model] || "#999999",
+        },
+        internalMetricKey: internalMetricKey, // Store original snake_case key
+      };
+    };
+    Object.entries(
+      rawDataInput.metrics_breakdown.high_level_categories || {}
+    ).forEach(([displayKey, catData]) => {
+      metricsBreakdownProcessed.highLevelCategories[displayKey] =
+        processCategory(displayKey, catData);
+    });
+    Object.entries(
+      rawDataInput.metrics_breakdown.low_level_metrics || {}
+    ).forEach(([displayKey, metricData]) => {
+      metricsBreakdownProcessed.lowLevelMetrics[displayKey] = processCategory(
+        displayKey,
+        metricData
+      );
+    });
+  } else {
+    console.warn("rawDataInput.metrics_breakdown is missing or not an object.");
+  }
+  // Prepare Radar Chart Data
+  const radarChartData = Object.entries(
+    metricsBreakdownProcessed.highLevelCategories
+  ).map(([displayKey, categoryData]) => {
+    // displayKey is Title Case here
+    const radarEntry = { category: displayKey }; // Use Title Case for radar axis label
+    modelOrder.forEach((modelName) => {
+      radarEntry[modelName] =
+        Number(categoryData.modelScores[modelName]?.nationalScore) || 0;
+    });
+    return radarEntry;
+  });
+  // Prepare Top Performers
+  const bestPerCategory = {};
+  Object.entries(metricsBreakdownProcessed.highLevelCategories).forEach(
+    ([displayKey, catData]) => {
+      bestPerCategory[displayKey] = catData.topPerformer;
+    }
+  );
+  const bestPerMetric = {};
+  Object.entries(metricsBreakdownProcessed.lowLevelMetrics).forEach(
+    ([displayKey, metricData]) => {
+      bestPerMetric[displayKey] = metricData.topPerformer;
+    }
+  );
+  // Prepare Task Performance Data
+  const taskCategoryMap = {
+    "Generating a Creative Idea": "creative",
+    "Creating a Travel Itinerary": "creative",
+    "Following Up on a Job Application": "practical",
+    "Planning Your Weekly Meals": "practical",
+    "Making a Decision Between Options": "analytical",
+    "Understanding a Complex Topic": "analytical",
+  };
+  // Pass the original rawDataInput to the helper, which expects snake_case keys internally
+  const taskPerformanceResults = processTaskPerformance(
+    rawDataInput,
+    taskCategoryMap,
+    modelOrder
+  );
+  const tasks = Object.keys(taskLevelPerformanceRaw || {}); // Use original snake_case keys
+  const taskCategories = {};
+  Object.entries(taskCategoryMap).forEach(([task, category]) => {
+    if (!taskCategories[category]) taskCategories[category] = [];
+    if (tasks.includes(task)) taskCategories[category].push(task);
+  });
+  const taskMetrics = new Set();
+  Object.values(taskLevelPerformanceRaw || {}).forEach((taskData) => {
+    Object.values(taskData || {}).forEach((modelData) => {
+      Object.keys(modelData || {}).forEach((metric) => taskMetrics.add(metric));
+    });
+  }); // metric is snake_case
+  const taskMetricsDisplayList = Array.from(taskMetrics)
+    .map(formatDisplayKey)
+    .sort(); // Create display list
+  const taskMetricsSnakeList = Array.from(taskMetrics).sort(); // List of original snake_case keys
+  // Final Return Structure
+  return {
+    models: overallRankingProcessed, // camelCase keys for top level
+    metricsData: metricsBreakdownProcessed, // Title Case keys for categories/metrics
+    radarData: radarChartData,
+    bestPerCategory: bestPerCategory, // Title Case keys
+    bestPerMetric: bestPerMetric, // Title Case keys
+    overviewCardData: taskPerformanceResults, // camelCase keys expected from helper
+    rawData: {
+      // Keep original structures under camelCase keys for clarity
+      taskLevelPerformance: taskLevelPerformanceRaw, // snake_case keys inside
+      mrpDemographics: mrpDemographicsRaw, // Title Case metric keys inside
+      // Processed lists/maps for filtering/display
+      demographicOptions: demographicOptions,
+      availableMetrics: availableMetricsList, // Title Case metric names
+      tasks: tasks,
+      taskCategories: taskCategories,
+      taskMetrics: taskMetricsDisplayList, // Title Case metric names for display
+      taskMetricsSnake: taskMetricsSnakeList, // snake_case keys for lookup
+      taskCategoryMap: taskCategoryMap,
+    },
+    metadata: metadata, // Original structure
+    equityAnalysis: equityAnalysis, // Original structure (snake_case keys)
+  };
+}

leaderboard-app/next.config.mjs ADDED Viewed

	@@ -0,0 +1,4 @@

+/** @type {import('next').NextConfig} */
+const nextConfig = {};
+export default nextConfig;

leaderboard-app/package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

leaderboard-app/package.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "name": "leaderboard-app",
+  "version": "0.1.0",
+  "private": true,
+  "scripts": {
+    "dev": "next dev",
+    "build": "next build",
+    "start": "next start",
+    "lint": "next lint"
+  },
+  "dependencies": {
+    "lucide-react": "^0.487.0",
+    "next": "15.2.3",
+    "react": "^19.0.0",
+    "react-dom": "^19.0.0",
+    "recharts": "^2.15.1"
+  },
+  "devDependencies": {
+    "@eslint/eslintrc": "^3",
+    "@tailwindcss/postcss": "^4",
+    "eslint": "^9",
+    "eslint-config-next": "15.2.3",
+    "tailwindcss": "^4"
+  }
+}

leaderboard-app/postcss.config.mjs ADDED Viewed

	@@ -0,0 +1,5 @@

+const config = {
+  plugins: ["@tailwindcss/postcss"],
+};
+export default config;

leaderboard-app/public/file.svg ADDED Viewed

leaderboard-app/public/globe.svg ADDED Viewed

leaderboard-app/public/leaderboard_data.json ADDED Viewed

The diff for this file is too large to render. See raw diff

leaderboard-app/public/next.svg ADDED Viewed

leaderboard-app/public/vercel.svg ADDED Viewed

leaderboard-app/public/window.svg ADDED Viewed