Writer¶
The Writer takes a dataset description, retrieved context chunks, and your feedback, then produces an improved description with every change traced to a verifiable source. It is composed of two components: a Content Screener that validates your feedback before it reaches the editor, and a Description Editor that performs the enhancement.
Content Screening¶
Class: src.llm.content_screener.ContentScreener
Before the editor sees your feedback, the screener checks it against four criteria:
Check |
What it asks |
|---|---|
Relevant? |
Does the feedback pertain to the dataset being described? |
Accurate? |
Is the feedback consistent with what is already known about the dataset? |
Respectful? |
Is the tone professional? |
Coherent? |
Does the feedback make sense as a dataset description instruction? |
The screener returns one of three recommendations:
accept— feedback is passed to the editor unchangedreject— feedback is discarded; you will be asked to rewrite itflag_for_review— feedback is shown in the UI with a warning for your review before proceeding
The screener prompt is stored in src/prompts/content_screener.yaml.
How the Enhancement Prompt Works¶
The enhancement prompt (src/prompts/editor.yaml) receives four inputs:
Original description — the text you want to improve
Evaluation feedback — the rubric breakdown from the Evaluator, so the LLM knows which criteria are failing
Retrieved context chunks — up to
top_k_context(default 5) excerpts from your uploaded documents, formatted with source metadata headersConversation history — all previous rounds of feedback and enhanced descriptions, so each pass builds on the last
The LLM is instructed to:
Improve the description while staying faithful to verifiable facts
Add information only when it can be traced to a context chunk from an uploaded document, the original description, or your explicit feedback
Return a structured JSON object with the improved text, a rationale, and a citation list
Citations¶
Every statement added or substantially changed in the enhanced description gets a citation. An example citation might look like:
{
"statement": "The raw images were segmented into solid and pore phases using Otsu's method.",
"source": "uploaded_document",
"quote": "...segmentation was performed using Otsu's method...",
"doc_title": "Smith_et_al_2015",
"page": 3,
"chunk_index": 5
}
Source types:
original_description— the claim was already present in your original textuploaded_document— the claim is supported by an uploaded document;doc_title,page, andchunk_indexpoint to the exact passageuser_feedback— the claim came directly from your written feedback
Always verify citations before publishing: the LLM may occasionally link a statement to a passage that only loosely supports it.
Running Without the UI¶
import json
from src.llm.client import RoccoClient
from src.editor.editor import DescriptionEditor
from src.ingestor.embedder import DocumentEmbedder
from src.retriever.retriever import VectorStoreManager
with open("src/evaluator/rubric.json") as f:
rubric = json.load(f)
client = RoccoClient()
embedder = DocumentEmbedder()
vsm = VectorStoreManager(embedder)
# vsm.load("my_faiss_index/") # optional: load a pre-built index
editor = DescriptionEditor(client, rubric, vsm)
result = editor.enhance(
draft_text="This dataset contains micro-CT images of Berea sandstone ...",
draft_evaluation=None, # pass an EvaluatorOutput if available
user_feedback="The samples were imaged at 2 µm voxel resolution.",
)
print(result.suggested_text)
for c in result.citation:
print(f" [{c.source}] {c.statement[:80]}...")
To run multiple rounds, call enhance() again on the improved text:
result2 = editor.enhance(
draft_text=result.suggested_text,
draft_evaluation=None,
user_feedback="Clarify the file naming convention.",
)
The editor automatically carries conversation history across calls. To start fresh:
editor.reset_conversation_history()
Session Files¶
Session state — original description, current description, and full conversation history — can be saved to disk and reloaded later:
from pathlib import Path
# Save
editor.save_session(Path("sessions/session_2024-01-15.json"))
# Reload in a new Python session
editor2 = DescriptionEditor(client, rubric, vsm)
editor2.load_session(Path("sessions/session_2024-01-15.json"))
Note
Session file persistence is not yet wired into the Streamlit UI. The web app uses in-memory state only; refreshing the browser resets the session.
Multi-Turn Refinement¶
Each call to enhance() appends to editor.conversation_history. This list is injected into
the next prompt, so the LLM understands what feedback has already been incorporated and can focus
on what is still missing. Typical refinement arc:
Initial description (score 5/10) → first enhancement → score 7/10
Targeted feedback on missing QA/QC details → second enhancement → score 8/10
Fine-tune wording → third enhancement → score 9/10
Output Schema¶
enhance() returns an EditorOutput Pydantic model:
class EditorOutput(BaseModel):
original_text: str # the description passed in
suggested_text: str # the improved description
rationale: str # summary of changes made
citation: List[Citation]
class Citation(BaseModel):
statement: str
source: str # "original_description" | "uploaded_document" | "user_feedback"
quote: str
doc_title: Optional[str] # for uploaded_document sources
page: Optional[int]
chunk_index: Optional[int]
See Also¶
Evaluator — Generating the
EvaluatorOutputpassed to the WriterDocument RAG — Building the vector store that the Writer queries
Streamlit App — Running the full workflow in the web UI
API Reference — Full class documentation