Evaluator¶
The Evaluator scores a dataset description against a 10-criterion rubric. The rubric is currently designed for porous media research datasets, but is easily adaptable to other domains (see Prompt Reference). Each criterion is worth 1 point, giving a total of 10 possible points.
What It Does¶
Given a plain-text description, the Evaluator:
Loads the rubric from
src/evaluator/rubric.jsonand three few-shot examples fromsrc/evaluator/examples_v3.jsonBuilds a system prompt containing the rubric and the few-shot examples
Sends the description to the configured LLM and asks for a structured breakdown
Returns an
EvaluatorOutputobject containing the per-criterion scores and an overall total
The Evaluation Rubric¶
Criterion |
What it checks |
|---|---|
Self-Contained Description |
Can the description be understood independently, without reading a related paper? |
Context of Creation |
Does the description explain why the dataset was created (study goals, motivation)? |
Porous Media Type |
Is the porous material clearly identified (rock type, cement, soil, etc.)? |
Research Problem |
Does the description state the high-level research question the data addresses? |
Reuse and Beneficiaries |
Does it explain how others could reuse the data and who would benefit (e.g., ML training, flow simulation, water resource management)? |
Methodology |
Is the data collection method described (imaging technique, experimental setup, simulation approach)? |
Contents and Organization |
Does it describe file types, folder structure, and what each file contains? |
Quality Control |
Were QA/QC steps performed and documented (artifact correction, alignment, calibration)? |
Clarity and Accessibility |
Is the language clear for both domain experts and general audiences? Are acronyms spelled out? |
Keywords |
Are porous media type, imaging method, and research methodology mentioned in a way that aids search? |
Score Interpretation¶
Score |
Interpretation |
|---|---|
8–10 |
Excellent. Clear, complete, ready for archival or citation. |
6–7 |
Good. Minor gaps; a single round of enhancement typically brings it to 8+. |
4–5 |
Fair. Significant gaps to address during the enhancement process. |
0–3 |
Poor. Description is too vague or incomplete. |
How the Prompt Works¶
The evaluator prompt is stored in src/prompts/evaluator.yaml and rendered with Jinja2. At a high level, it:
Presents the rubric — each criterion’s name and description are serialised to JSON and injected into the system message, so the LLM understands exactly what to look for.
Provides few-shot examples — three (description, score, explanation) pairs from
src/evaluator/examples_v3.jsondemonstrate the expected output format and calibrate the LLM’s scoring style.Asks for structured output — the LLM is instructed to return a JSON object with a
rubric_breakdownlist (one entry per criterion) and atotal_score.
The few-shot examples are crucial for consistency: without them the LLM’s interpretation of marginal cases (e.g., a criterion that is half-met) varies significantly across calls.
Running Without the UI¶
import json
from src.llm.client import RoccoClient
from src.evaluator.evaluator import DescriptionEvaluator
with open("src/evaluator/rubric.json") as f:
rubric = json.load(f)
with open("src/evaluator/examples_v3.json") as f:
examples = json.load(f)
client = RoccoClient() # reads LLM_* from .env
evaluator = DescriptionEvaluator(client, rubric, examples)
description = "This dataset contains micro-CT images of Berea sandstone ... "
result = evaluator.evaluate(description)
print(f"Score: {result.total_score}/10")
for item in result.rubric_breakdown:
status = "✓" if item.score >= 1 else "✗"
print(f" {status} {item.criterion}: {item.explanation}")
You can also print a formatted summary:
evaluator.print_evaluation_result(result)
Output Schema¶
evaluate() returns an EvaluatorOutput dataclass:
@dataclass
class RubricItem:
criterion: str # criterion name
score: float # 0 or 1
explanation: str # why the criterion passed or failed
@dataclass
class EvaluatorOutput:
total_score: float # sum of all criterion scores
rubric_breakdown: List[RubricItem]
comments: Optional[str] # overall LLM commentary, if any
Modifying the Rubric¶
If you need to add, remove, or reword criteria:
Edit
src/evaluator/rubric.jsonUpdate
src/evaluator/examples_v3.jsonso the few-shot examples reflect the new criteriaBump the version in
src/prompts/evaluator.yaml(major version if the score scale changes)
See Also¶
Writer — Using evaluation results to drive description enhancement
Streamlit App — Running the Evaluator through the web UI
API Reference — Full class documentation