Introduction
The AuthData Audit Framework is a reproducible, observable audit framework for validating datasets across the OpenGINXplore ecosystem with full action traceability and a NextJS visualization dashboard.
Overview
When managing open data platforms, it's critical to ensure:
- Data Availability: Source files exist and are accessible on GitHub
- Data Quality: Data meets schema and validation requirements
- Data Visibility: End users can see the correct data in the UI
Manual verification is error-prone and time-consuming. This framework automates the entire verification process with detailed logging for complete transparency.
Three-Phase Auditing
The framework validates data through three distinct phases:
Phase A: Source Discovery
Verifies that source data exists on GitHub and has the expected structure:
- Construct raw GitHub URL
- HTTP GET to verify accessibility
- Parse JSON to extract schema
- Validate expected columns exist
Phase B: Data Integrity
Validates that data meets quality requirements:
- Minimum row count checks
- Column value validation (not empty, numeric)
- Schema column verification
Phase C: App Visibility
Uses Selenium to verify data is displayed correctly in the web UI:
- Navigate complex UI hierarchies
- Configurable click sequences with delays
- Table data extraction and comparison
Key Features
- Full Observability: Every HTTP request/response logged with headers and body, every Selenium action logged with selectors and results
- Tree Navigation Support: Navigate complex UI hierarchies (e.g., Tourism → Arrivals → By Country)
- CLI Tools: Run full audits or individual phases, explore UI structure interactively
- NextJS Dashboard: Visual timeline of all audit actions with pass/fail summary reports
Quick Start
Prerequisites
- Python 3.10+
- Node.js 18+ (for dashboard)
- Chrome browser (for Selenium)
- ChromeDriver (matching Chrome version)
Installation
# Navigate to the audit engine
cd authdata/audit-engine
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Verify installation
python main.py --version
Run Your First Audit
# Run a complete audit for all datasets
python main.py run
# Run audit for a specific dataset and year
python main.py run -d "Top 10 Source Markets" -y 2023
# View the results
python main.py show <run_id>
Start the Dashboard
cd ../audit-dashboard
npm install
npm run dev -- -p 3002
# Open http://localhost:3002
Project Structure
authdata/
├── audit-engine/ # Python automation engine
│ ├── main.py # CLI entry point
│ ├── config/
│ │ └── datasets.json # Dataset configurations
│ ├── core/ # Logging, HTTP client, browser
│ ├── phases/ # Source, integrity, visibility phases
│ ├── workflows/ # Main orchestrator
│ ├── tools/ # UI explorer, data comparator
│ └── audit-results/ # Generated audit outputs
│
└── audit-dashboard/ # NextJS visualization
├── app/ # Dashboard pages
├── components/ # UI components
└── lib/ # Data loading utilities
Next Steps
- Architecture - Understand the system design and data flow
- Configuration - Learn about dataset configuration schema
- CLI Reference - Explore all available commands
- Dashboard - Using the NextJS visualization UI