<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Citation Recommendation | Lei Zhang</title><link>https://zhanglei.page/tags/citation-recommendation/</link><atom:link href="https://zhanglei.page/tags/citation-recommendation/index.xml" rel="self" type="application/rss+xml"/><description>Citation Recommendation</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 20 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://zhanglei.page/media/icon_hu_102d14ed545eed19.png</url><title>Citation Recommendation</title><link>https://zhanglei.page/tags/citation-recommendation/</link></image><item><title>MasterSet: A Benchmark for Must-Cite Citation Recommendation</title><link>https://zhanglei.page/research/masterset/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><guid>https://zhanglei.page/research/masterset/</guid><description>&lt;h3 id="overview"&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;MasterSet&lt;/strong&gt; is a large-scale benchmark designed to evaluate &lt;em&gt;must-cite&lt;/em&gt; citation recommendation in AI and machine learning research. Given only the title and abstract of a paper, the task is to retrieve the small set of papers so central to the work—direct experimental baselines, foundational methods, core datasets—that omitting them would misrepresent the contribution&amp;rsquo;s novelty or undermine reproducibility.&lt;/p&gt;
&lt;p&gt;A live demo is available at &lt;a href="https://mustcite.com" target="_blank" rel="noopener"&gt;&lt;strong&gt;mustcite.com&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="motivation"&gt;Motivation&lt;/h3&gt;
&lt;p&gt;The volume of AI/ML publications has grown by an order of magnitude over the past decade. Existing citation recommendation systems focus on broad topical relevance, but researchers need something more targeted: which specific papers &lt;em&gt;must&lt;/em&gt; they cite? Missing a key baseline or foundational method is not merely an oversight—it can constitute an incomplete or misleading submission.&lt;/p&gt;
&lt;p&gt;MasterSet is the first benchmark specifically designed to evaluate this harder, higher-stakes task.&lt;/p&gt;
&lt;h3 id="dataset"&gt;Dataset&lt;/h3&gt;
&lt;p&gt;MasterSet is built on &lt;strong&gt;153,373 papers&lt;/strong&gt; collected from official proceedings of &lt;strong&gt;15 peer-reviewed venues&lt;/strong&gt;, including NeurIPS, ICML, ICLR, CVPR, ACL, and others spanning core ML, computer vision, NLP, and probabilistic methods. Papers are collected using &lt;a href="https://github.com/zhangleiniu/OpenPapers" target="_blank" rel="noopener"&gt;&lt;strong&gt;Open Papers&lt;/strong&gt;&lt;/a&gt;, a venue-specific scraper that retrieves directly from official proceedings websites rather than aggregator APIs, yielding exact, verified paper counts free of preprint conflation.&lt;/p&gt;
&lt;h3 id="annotation"&gt;Annotation&lt;/h3&gt;
&lt;p&gt;Every citation instance is annotated with a three-tier labeling scheme:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Type I&lt;/strong&gt;: Experimental baseline status (binary)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Type II&lt;/strong&gt;: Core relevance on a 1–5 scale&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Type III&lt;/strong&gt;: Intra-paper mention frequency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Over 2 million citation instances are labeled using Gemini 2.5 Flash as an LLM judge, validated against human expert annotations on a stratified sample of 510 instances.&lt;/p&gt;
&lt;h3 id="benchmark-results"&gt;Benchmark Results&lt;/h3&gt;
&lt;p&gt;We evaluate sparse retrieval (BM25), dense scientific embeddings (SPECTER, SciNCL, SciBERT), and graph-based methods. The best baseline, SciBERT fine-tuned with contrastive loss, recovers fewer than 50% of must-cite papers in the top 100 from a 67,761-paper pool—confirming that must-cite retrieval remains a substantially open problem.&lt;/p&gt;</description></item></channel></rss>