Skip to main content

Evaluation

Leaderboard of Open LLMs Ranked by LLM Judges
Machine-Learning Large-Language-Models Leaderboard Evaluation
An evaluation of recent consumer-grade open LLMs based on ratings generated through an LLM-as-a-judge framework.
Evaluating LLM Performance via LLM Judges
Machine-Learning Large-Language-Models Evaluation Methodology Extra
Methodology details for how LLMs can rate the performance of other LLMs.