Back to Projects

AI Documentation Generator

ROLE

Personal Project

TECHNOLOGIES

Python, Ollama, LLMs

Brief

CodeDoc is a lightweight tool that leverages open-source LLMs to automatically generate documentation and unit tests for Python code. It runs entirely locally using Ollama, requiring no API keys or cloud services.

The tool addresses a common challenge in software development: maintaining comprehensive documentation and test coverage. By automating these tasks, it helps developers focus on writing core functionality while ensuring their code remains well-documented and tested.

Implementation

Built the complete pipeline for code analysis and generation:

  • Integrated Ollama for local LLM inference with support for multiple models (Llama 3.2, CodeLlama, Mistral)
  • Designed prompt templates for generating docstrings, module documentation, and pytest-compatible unit tests
  • Implemented evaluation metrics to track performance (response time, token usage) and quality (completeness, syntax validity)
  • Created a command-line interface for flexible workflows (documentation only, tests only, or both)

Technical Details

The system consists of modular components that work together to analyze code and generate outputs:

Key Components

  1. LLM Handler: Manages connections to Ollama and handles model inference with configurable parameters
  2. Documentation Generator: Analyzes Python files to create docstrings and inline comments for functions and classes
  3. Test Generator: Produces pytest-compatible unit tests with edge cases and proper assertions
  4. Evaluator: Tracks performance metrics and provides comparative analysis between different models

Key Features

Automatic Documentation Generation

Generates comprehensive docstrings following Python conventions, module-level documentation, and inline comments for complex logic.

Unit Test Generation

Creates pytest-compatible test cases that cover normal cases, edge cases, and error conditions with proper test structure.

Model Comparison

Includes evaluation metrics to compare different LLM models based on speed, token efficiency, and output quality.

Technical Challenges

Addressed several challenges in working with LLMs for code generation:

  • Prompt Engineering: Iteratively refined prompts to generate syntactically correct and contextually relevant outputs
  • Context Window Management: Handled large Python files by intelligently chunking code while maintaining context
  • Output Validation: Implemented syntax checking to ensure generated documentation and tests are valid Python code

Takeaways

This project demonstrated the practical application of LLMs for developer tooling. I learned about prompt engineering techniques, the importance of evaluation metrics in assessing LLM outputs, and how to design modular systems that can work with different models. The experience highlighted both the capabilities and limitations of current open-source LLMs for code understanding tasks.