API Reference¶
Auto-generated documentation for all public classes and functions. Click any section header to expand it.
LLM Client
- class src.llm.client.LLMClient(api_url=None, api_key=None, model=None, provider=None, timeout=60)[source]¶
Bases:
objectProvider-agnostic LLM client supporting OpenAI, Anthropic, Gemini, DeepSeek, HuggingFace, Ollama, SambaNova, and any OpenAI-compatible API.
- Parameters:
api_url (str)
api_key (str)
model (str)
provider (str)
timeout (int)
- PROVIDER_URLS = {'anthropic': 'https://api.anthropic.com/v1', 'deepseek': 'https://api.deepseek.com/v1', 'gemini': 'https://generativelanguage.googleapis.com/v1beta/openai/', 'huggingface': 'https://router.huggingface.co/v1', 'ollama': 'http://localhost:11434/v1', 'openai': 'https://api.openai.com/v1', 'sambanova': 'https://ai.tejas.tacc.utexas.edu/v1'}¶
- class src.llm.client.RoccoClient(api_url=None, api_key=None, model=None, provider=None, timeout=60)[source]¶
Bases:
LLMClientRoccoClient extends LLMClient for specific Rocco interactions.
- Parameters:
api_url (str)
api_key (str)
model (str)
provider (str)
timeout (int)
Evaluator
- class src.evaluator.evaluator.DescriptionEvaluator(model, rubric, examples)[source]¶
Bases:
objectEvaluates dataset descriptions against a rubric
- Parameters:
model (RoccoClient)
rubric (List[Dict[str, Any]])
examples (List[Dict[str, Any]])
- build_prompt(draft_text)[source]¶
Combine rubric, examples, and draft into prompt
- Parameters:
draft_text (str)
- Return type:
str
- evaluate(draft_text)[source]¶
Call the LLM and return structured evaluation
- Parameters:
draft_text (str)
- Return type:
- print_evaluation_result(evaluation_output)[source]¶
Utility to print evaluation results
- Parameters:
evaluation_output (EvaluatorOutput)
- Return type:
None
Editor
- class src.editor.editor.DescriptionEditor(model, rubric, vector_store_manager=None, use_rag=True, top_k_context=5)[source]¶
Bases:
objectImproves dataset descriptions
- Parameters:
model (RoccoClient)
rubric (Dict)
vector_store_manager (VectorStoreManager | None)
use_rag (bool)
top_k_context (int)
- save_session(filepath)[source]¶
Save the current session to a file
- Parameters:
filepath (Path)
- Return type:
None
- load_session(filepath)[source]¶
Load a session from a file
- Parameters:
filepath (Path)
- Return type:
None
- retrieve_context(query=None)[source]¶
Retrieve relevant context from related papers
- Parameters:
query (str)
- Return type:
List[Document]
- generate_search_query(draft_evaluation, query_all=True)[source]¶
Generate search queries based on evaluation feedback
- Parameters:
draft_evaluation (EvaluatorOutput)
query_all (bool)
- Return type:
List[str]
- build_prompt(draft_text, draft_evaluation, context=None, user_feedback=None, history_override=None)[source]¶
Prepare prompt for improving the draft
- Parameters:
draft_text (str)
draft_evaluation (EvaluatorOutput)
context (List[Document] | List[str] | None)
user_feedback (str | None)
history_override (List[Dict[str, str]] | None)
- Return type:
str
- enhance(draft_text, draft_evaluation, retrieve_context=True, context_override=None, query_all_criterion=True, user_feedback=None, history_override=None)[source]¶
Improve the description draft using evaluation feedback and optional context from papers
- Parameters:
draft_text (str)
draft_evaluation (EvaluatorOutput)
retrieve_context (bool)
context_override (List[str] | None)
query_all_criterion (bool)
user_feedback (str | None)
history_override (List[Dict[str, str]] | None)
- Return type:
- print_enhancement_result(editor_output)[source]¶
Utility to print enhancement results
- Parameters:
editor_output (EditorOutput)
- Return type:
None
Document Ingestor
Document Embedder
- class src.ingestor.embedder.BaseEmbedder[source]¶
Bases:
ABCBase class for all embedders
- abstractmethod embed_documents(texts)[source]¶
Embed a list of documents
- Parameters:
texts (List[str])
- Return type:
List[List[float]]
- class src.ingestor.embedder.DocumentEmbedder(model_name='BAAI/bge-large-en-v1.5', model_kwargs=None, encode_kwargs=None)[source]¶
Bases:
BaseEmbedderHuggingFace embeddings implementation
- Parameters:
model_name (str)
model_kwargs (Dict[str, Any] | None)
encode_kwargs (Dict[str, Any] | None)
- embed_documents(texts)[source]¶
Embed a list of document strings and return their dense vectors.
- Parameters:
texts (List[str])
- Return type:
List[List[float]]
Vector Store Manager
- class src.retriever.retriever.VectorStoreManager(embedder)[source]¶
Bases:
objectManages vector store operations (create, save, load, query)
- Parameters:
embedder (DocumentEmbedder)
- create_from_documents(documents)[source]¶
Create a new vector store from documents.
- Parameters:
documents (List[Document]) – List of Document objects to index
- Returns:
Created vector store
- Return type:
VectorStore
- add_documents(documents)[source]¶
Add documents to existing vector store.
- Parameters:
documents (List[Document]) – List of Document objects to add
- Return type:
None
- save(path)[source]¶
Save vector store to disk.
- Parameters:
path (str | Path) – Directory path to save the vector store
- Return type:
None
- load(path)[source]¶
Load vector store from disk.
- Parameters:
path (str | Path) – Directory path containing the saved vector store
- Returns:
Loaded vector store
- Return type:
VectorStore
- similarity_search(query, k=4)[source]¶
Search for similar documents.
- Parameters:
query (str) – Query text
k (int) – Number of results to return
- Returns:
List of most similar documents
- Return type:
List[Document]
Content Screener
- class src.llm.content_screener.ContentScreener(model)[source]¶
Bases:
objectScreen contents for usefulness
- Parameters:
model (RoccoClient)
- screen_user_content(content, context=None)[source]¶
Screen user provided content
- Returns:
- Screening result with keys:
is_valid (bool): Whether content is valid
issues (list): Issues found
confidence (float): Confidence score (0-1)
recommendation (str): Recommended action
- Return type:
dict
- Parameters:
content (str)
context (str)
Prompt Loader
Prompt loader utility for managing versioned YAML prompts.
- src.prompts.loader.load_prompt(name)[source]¶
Load a prompt from src/prompts/{name}.yaml.
- Parameters:
name (str) – Prompt name (e.g., ‘evaluator’, ‘editor’, ‘content_screener’)
- Returns:
version, description, system (optional), user
- Return type:
Dict with keys
- Raises:
FileNotFoundError – If prompt file does not exist
yaml.YAMLError – If YAML parsing fails
Output Schemas
- class src.llm.schemas.RubricItem(criterion, score, explanation=None)[source]¶
Bases:
objectOne criterion from the evaluation rubric, with its score and explanation.
- Parameters:
criterion (str)
score (float)
explanation (str)
- criterion: str¶
- score: float¶
- explanation: str = None¶
- class src.llm.schemas.EvaluatorOutput(total_score, rubric_breakdown, comments=None)[source]¶
Bases:
objectStructured output from DescriptionEvaluator
- Parameters:
total_score (float)
rubric_breakdown (List[RubricItem])
comments (str | None)
- total_score: float¶
- rubric_breakdown: List[RubricItem]¶
- comments: str | None = None¶
- class src.llm.schemas.Citation(*, statement, source, quote, doc_title=None, page=None, chunk_index=None)[source]¶
Bases:
BaseModelCitation schema for each statemnt in the improved description.
- Parameters:
statement (str)
source (str)
quote (str)
doc_title (str | None)
page (int | None)
chunk_index (int | None)
- statement: str¶
- source: str¶
- quote: str¶
- doc_title: str | None¶
- page: int | None¶
- chunk_index: int | None¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class src.llm.schemas.EditorOutput(*, original_text, suggested_text, rationale, citation=<factory>, context_used=<factory>)[source]¶
Bases:
BaseModelOutput from the description editor
- Parameters:
original_text (str)
suggested_text (str)
rationale (str)
citation (List[Citation])
context_used (List[Dict[str, Any]])
- original_text: str¶
- suggested_text: str¶
- rationale: str¶
- context_used: List[Dict[str, Any]]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class src.llm.schemas.EditingSession(*, metadata, created_at, original_description=None, current_description=None, conversation_history=<factory>, rubric, config=<factory>)[source]¶
Bases:
BaseModelSchema for saving/loading editing sessions
- Parameters:
metadata (Dict[str, Any])
created_at (str)
original_description (str | None)
current_description (str | None)
conversation_history (List[Dict[str, str]])
rubric (Dict[str, Any])
config (Dict[str, Any])
- metadata: Dict[str, Any]¶
- created_at: str¶
- original_description: str | None¶
- current_description: str | None¶
- conversation_history: List[Dict[str, str]]¶
- rubric: Dict[str, Any]¶
- config: Dict[str, Any]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class src.llm.schemas.PDFChunk(chunk_id, text, embedding=None, source_pdf=None)[source]¶
Bases:
objectA single text chunk extracted from a PDF, optionally with its embedding vector.
- Parameters:
chunk_id (str)
text (str)
embedding (List[float] | None)
source_pdf (str | None)
- chunk_id: str¶
- text: str¶
- embedding: List[float] | None = None¶
- source_pdf: str | None = None¶
Configuration
Environment variables (set in .env):
LLM_PROVIDER— Provider shortcut (openai,anthropic,ollama, etc.)LLM_API_KEY— API key (required)LLM_BASE_URL— Custom endpoint URL (optional)LLM_MODEL— Model name (defaults togpt-4o-mini)
See Configuration for all providers and setup.
See Also¶
Architecture — System design and data flow
Contributing — Development guidelines
Prompt Reference — Prompt YAML reference and editing guide
CLAUDE.md— Implementation details and patterns