原油新闻语调与期货市场收益率:基于大语言模型与垂直领域词典集成研究
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Crude oil news tone and futures market returns: An integrated approach based on large language models and domainspecific lexicons
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    大语言模型的垂直化发展已成为重要趋势,尤其在原油期货等高度专业化、跨市场场景下,通用大语言模型难以准确识别术语与复杂语境,且易产生前视偏差.本文提出集成大语言模型与垂直领域词典的原油新闻语调测度方法,通过融合领域词典的先验知识与大语言模型的语义理解能力,提升所测度语调的解释性与预测能力.具体地,基于2018年—2022年来自InfoBank与Factiva的93 004篇中英文原油新闻,本文构建垂直领域词典并提纯高信噪比语料,在通用BERT模型上进行领域预训练,继而结合期货收益符号开展弱监督微调,集成测度原油期货新闻语调.实证结果表明,该方法得到的新闻语调在解释和预测上海原油期货收益率方面显著优于词典法与通用模型;各项稳健性检验结果一致,并在2023年—2024年样本外预测中表现优越;新闻语调对原油期货收益率的影响主要经由投资者关注度这一中介渠道传导;在集成方法下,该机制的方向与显著性与经济逻辑保持一致.另外,进一步的风险溢出分析发现基于集成方法的新闻语调能够有效刻画国际新闻情绪对国内期货市场的尾部风险传导.本文贡献在于提出可复制的金融大模型垂直化应用路径,从收益与风险双视角揭示了新闻语调在市场信息传导中的作用机制,为原油期货市场的风险识别与政策制定提供了新的量化工具.

    Abstract:

    The verticalization of large language models (LLMs) has become an important trend. In highly specialized and crossmarket contexts, such as crude oil futures, generalpurpose LLMs struggle to accurately interpret domainspecific terminology and complex semantics, and they are prone to lookahead bias. This paper proposes an integrated approach that combines LLMs with a domainspecific sentiment lexicon to measure the tone of crude oil news. By integrating the prior knowledge of a domain lexicon with the semantic understanding of LLMs, the proposed method enhances both the interpretability and predictive power of tone measures. Specifically, based on 93 004 Chinese and English crude oil news articles from InfoBank and Factiva between 2018 and 2022, this study constructs a vertical domain lexicon, refines high signaltonoise corpora, performs domainspecific pretraining on a general BERT model, and applies weakly supervised finetuning guided by futures return signs to generate an integrated tone index. Empirical results show that the tone measure derived from the proposed method significantly outperforms dictionarybased and general LLM methods in explaining and predicting Shanghai crude oil futures returns. Robustness tests confirm consistent results, and the method also exhibits superior outofsample predictive performance for 2023~2024. Further analysis reveals that the impact of news tone on crude oil futures returns is transmitted through investor attention as a mediating channel, and under the integrated framework, this mechanism aligns closely with economic logic in both direction and significance. Moreover, the risk spillover analysis indicates that the integrated tone measure effectively captures the tailrisk transmission from international news sentiment to China’s crude oil futures market. This study contributes by proposing a reproducible vertical application framework for financial large language models, revealing the dual role of news tone in market return formation and risk transmission, and providing new quantitative tools for risk identification and policy formulation in the crude oil futures market.

    参考文献
    相似文献
    引证文献
引用本文

陈荣达,肖文昊,金骋路,厉涵.原油新闻语调与期货市场收益率:基于大语言模型与垂直领域词典集成研究[J].管理科学学报,2026,(2):81~103

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-03-26
  • 出版日期:
您是第位访问者
管理科学学报 ® 2026 版权所有
通讯地址:天津市南开区卫津路92号天津大学第25教学楼A座908室 邮编:300072
联系电话/传真:022-27403197 电子信箱:jmsc@tju.edu.cn