PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination
By: Hyunseung Lim , Sooyohn Nam , Sungmin Na and more
Potential Business Impact:
Helps computers understand if new ideas are truly new.
Patent examination remains an ongoing challenge in the NLP literature even after the advent of large language models (LLMs), as it requires an extensive yet nuanced human judgment on whether a submitted claim meets the statutory standards of novelty and non-obviousness against previously granted claims -- prior art -- in expert domains. Previous NLP studies have approached this challenge as a prediction task (e.g., forecasting grant outcomes) with high-level proxies such as similarity metrics or classifiers trained on historical labels. However, this approach often overlooks the step-by-step evaluations that examiners must make with profound information, including rationales for the decisions provided in office actions documents, which also makes it harder to measure the current state of techniques in patent review processes. To fill this gap, we construct PANORAMA, a dataset of 8,143 U.S. patent examination records that preserves the full decision trails, including original applications, all cited references, Non-Final Rejections, and Notices of Allowance. Also, PANORAMA decomposes the trails into sequential benchmarks that emulate patent professionals' patent review processes and allow researchers to examine large language models' capabilities at each step of them. Our findings indicate that, although LLMs are relatively effective at retrieving relevant prior art and pinpointing the pertinent paragraphs, they struggle to assess the novelty and non-obviousness of patent claims. We discuss these results and argue that advancing NLP, including LLMs, in the patent domain requires a deeper understanding of real-world patent examination. Our dataset is openly available at https://huggingface.co/datasets/LG-AI-Research/PANORAMA.
Similar Papers
Towards Automated Quality Assurance of Patent Specifications: A Multi-Dimensional LLM Framework
Information Retrieval
Checks patents for mistakes, suggests fixes.
Enriching Patent Claim Generation with European Patent Dataset
Computation and Language
Helps lawyers write better patent claims faster.
Towards Automated Quality Assurance of Patent Specifications: A Multi-Dimensional LLM Framework
Information Retrieval
Checks if computer-written patents are good.