Accurate classification of Harmonized Tariff Schedule (HTS) codes is critical for customs clearance, duty assessment, and regulatory compliance in maritime logistics. However, product descriptions are often short, incomplete, or ambiguous, and correct classification depends on hierarchical tariff structures, legal notes, and jurisdiction-specific rules. To address these challenges, researchers have proposed a consensus-based agentic large language model (LLM) framework specifically designed for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments, according to a preprint on arXiv.
The HTS Classification Challenge
HTS codes underpin international trade, but their exact classification remains difficult even for advanced LLMs. The researchers note that performance decreases from coarse chapter-level prediction to fine-grained tariff and statistical suffix assignment. This hierarchical complexity — from 2-digit chapters down to 10-digit statistical suffixes — requires reasoning over official tariff documents and legal notes, which single-step LLM predictions often fail to handle reliably.
The Agentic LLM Framework
The proposed framework integrates multiple components to improve accuracy and interpretability:
- Multi-agent information retrieval: Multiple LLM agents independently gather relevant information from official tariff documents and legal notes.
- Semantic retrieval: Using embeddings to find the most relevant sections of tariff documents for a given product description.
- Evidence-grounded reasoning: The LLM generates explanations for why a particular code applies, linking to specific document sections.
- Consensus-based validation: Agents vote on the classification, with element-wise voting across hierarchical code components (chapter, heading, subheading, tariff item, statistical suffix).
- Confidence estimation: The model outputs a confidence score for each prediction, flagging uncertain cases.
- Human-in-the-loop escalation: Low-confidence predictions are sent to human experts for review, ensuring compliance and accountability.
This workflow is designed to be more interpretable and accountable than fully autonomous single-step classification, according to the researchers.
Experimental Results
The researchers evaluated the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts. Results showed that exact 10-digit classification remains challenging, with performance declining from coarse (chapter-level) to fine-grained (statistical suffix) assignments. These findings underscore the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous solutions, the paper states.
Implications for Trade Technology
For enterprise technology decision-makers in logistics and customs, the framework demonstrates a practical path to improving HTS code accuracy without full automation. The consensus-based approach reduces the risk of misclassification, which can lead to fines, delays, and duty overpayments. The integration of human-in-the-loop escalation aligns with compliance requirements in smart-port operations. The researchers have made their code available to the public, enabling further development and validation by the trade technology community.