{/* Header */}

Prolific's AI User Experience Leaderboard

A benchmark assessing how well language models handle real-world tasks based on user experiences.

{/* Tab Buttons */}

{[ "overview", "metrics-breakdown", "task-performance", "demographic-analysis", "about", ].map((tab) => ( ))}

{/* Overview Tab */} {activeTab === "overview" && (

{/* Overall Rankings Card */}

Overall Model Rankings

{/* Use camelCase model object from rankedModels */} {rankedModels.map((model) => ( ))}

Rank	Model	Overall Score	Overall SD	Max Equity Gap	Max Gap Area	Equity Concerns	User Retention
{model.rank}	{model.model}	{model.overallScore !== null ? model.overallScore.toFixed(1) : "N/A"}	{model.stdDevAcrossCats !== "N/A" && model.stdDevAcrossCats !== null ? `± ${Number(model.stdDevAcrossCats).toFixed(1)}` : "N/A"}	{renderMaxEquityGapCell(model)}	{model.maxEffectFactor && model.maxEffectFactor !== "N/A" ? ( {formatDisplayKey(model.maxEffectFactor)} {formatDisplayKey(model.maxEffectCategory)} ) : ( N/A )}	{model.equityConcernPercentage !== null ? ( {model.equityConcernPercentage.toFixed(1)}% ) : ( N/A )}	{model.repeatUsageScore !== null ? ( {model.repeatUsageScore.toFixed(1)}% ) : ( N/A )}

{/* UPDATED: Vertical list for column descriptions with detailed info */}

{/* Column descriptions in vertical list */}

Overall Score: Avg. score across high-level categories

Overall SD: Standard deviation across high-level categories (lower = more consistent)

Max Equity Gap:{" "} Largest demographic score difference (hover for details on significance and effect size)

Max Gap Area:{" "} Demographic group and Category where the Max Equity Gap occurs

Equity Concerns:{" "} Percentage of demographic gaps flagged as concerns (large effect & statistically significant)

User Retention:{" "} Percentage of participants who said they would use the model again

{/* Color key on a single line */}

Color Key:

Equity Concern (Large Effect & Statistically Significant)

Large Effect (Not Statistically Significant)

{/* Top Performers Section */}

Top Performers by Category

{/* Top Performers Tables - Access using Title Case keys */} {topPerformersView === "high-level" && (

Top Performers by High-Level Category

{Object.entries(bestPerCategory || {}).length > 0 ? ( {Object.entries(bestPerCategory) .sort(([a], [b]) => a.localeCompare(b)) .map(([catDisplayKey, bestInfo], idx) => ( ))}

Category	Best Model	Score
{catDisplayKey}	{bestInfo.model !== "N/A" ? ( {bestInfo.model} ) : ( N/A )}	{bestInfo.score !== null ? ( {bestInfo.score.toFixed(1)} ) : ( N/A )}

) : (

Top performer data not available.

)}

Scores based on user ratings, normalized to 0-100.

)} {topPerformersView === "low-level" && (

Top Performers by Low-Level Metric

{Object.entries(bestPerMetric || {}).length > 0 ? ( {Object.entries(bestPerMetric) .sort(([a], [b]) => a.localeCompare(b)) .map(([metricDisplayKey, bestInfo], idx) => ( ))}

Metric	Best Model	Score
{metricDisplayKey}	{bestInfo.model !== "N/A" ? ( {bestInfo.model} ) : ( N/A )}	{bestInfo.score !== null ? ( {bestInfo.score.toFixed(1)} ) : ( N/A )}

) : (

Low-level metric top performer data not available.

)}

Scores based on user ratings, normalized to 0-100.

)}

)}{" "} {/* End Overview Tab */} {/* Other Tabs - Pass Correct Props */} {activeTab === "metrics-breakdown" && ( )} {activeTab === "task-performance" && ( )} {activeTab === "demographic-analysis" && ( )} {activeTab === "about" && }