CLI Usage
OpenGIN Tracer provides a Command Line Interface (CLI) to interact with the pipeline.
Installation
Ensure you have the package installed (preferably in a virtual environment):
pip install -e .
Running a Trace
To extract data from a PDF file, use the opengin tracer run command. This command supports both local files and URLs.
opengin tracer run <INPUT_SOURCE> [OPTIONS]
Arguments
INPUT_SOURCE: Absolute/relative path to a local PDF file OR a URL (starting withhttp://orhttps://).
Options
--name: A human-readable name for this pipeline run (e.g., "financial_report_2024"). Defaults torun_<timestamp>.--prompt: The extraction prompt text OR a path to a text file containing the prompt. Defaults to "Extract all tables.".--metadata-schema: Path to a YAML file defining the metadata schema to extract for each table.
Examples
1. Basic Usage (Local File):
opengin tracer run ./data/sample_invoice.pdf --name output_pipeline
2. Using a Remote URL: The CLI will automatically download the file to a temporary location, process it, and clean it up.
opengin tracer run https://raw.githubusercontent.com/LDFLK/opengin-ingestion/main/data/quickstart_sample.pdf --name quickstart-sample
3. Using a Prompt File: For complex instructions, save your prompt in a text file.
opengin tracer run ./data/invoice.pdf --prompt ./prompts/invoice_extraction.txt
4. With Metadata extraction: Extract keys defined in a YAML schema.
opengin tracer run ./data/doc.pdf --metadata-schema ./metadata.yml
Output
After execution, the results can be found in the pipelines/<pipeline_name>/<run_id>/output/ directory.
The console output will display the run_id for your reference.