Assisting Research Proposal Writing with Large Language Models: Evaluation and Refinement
By: Jing Ren, Weiqi Wang
Potential Business Impact:
Makes AI writing more honest and accurate.
Large language models (LLMs) like ChatGPT are increasingly used in academic writing, yet issues such as incorrect or fabricated references raise ethical concerns. Moreover, current content quality evaluations often rely on subjective human judgment, which is labor-intensive and lacks objectivity, potentially compromising the consistency and reliability. In this study, to provide a quantitative evaluation and enhance research proposal writing capabilities of LLMs, we propose two key evaluation metrics--content quality and reference validity--and an iterative prompting method based on the scores derived from these two metrics. Our extensive experiments show that the proposed metrics provide an objective, quantitative framework for assessing ChatGPT's writing performance. Additionally, iterative prompting significantly enhances content quality while reducing reference inaccuracies and fabrications, addressing critical ethical challenges in academic contexts.
Similar Papers
Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research
Artificial Intelligence
Helps computers understand what people write faster.
Assessing the Reliability and Validity of Large Language Models for Automated Assessment of Student Essays in Higher Education
Computers and Society
AI can't reliably grade essays yet.
Evaluating Large Language Models for Evidence-Based Clinical Question Answering
Computation and Language
Helps doctors answer patient questions better.