惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

D
Darknet – Hacking Tools, Hacker News & Cyber Security
Jina AI
Jina AI
博客园_首页
J
Java Code Geeks
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 司徒正美
Hugging Face - Blog
Hugging Face - Blog
S
SegmentFault 最新的问题
MyScale Blog
MyScale Blog
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
Forbes - Security
Forbes - Security
大猫的无限游戏
大猫的无限游戏
Vercel News
Vercel News
Y
Y Combinator Blog
Google DeepMind News
Google DeepMind News
The Register - Security
The Register - Security
N
News | PayPal Newsroom
S
Security Archives - TechRepublic
量子位
Cisco Talos Blog
Cisco Talos Blog
V
V2EX
C
Cisco Blogs
The Cloudflare Blog
Stack Overflow Blog
Stack Overflow Blog
L
LangChain Blog
Scott Helme
Scott Helme
S
Securelist
Security Latest
Security Latest
爱范儿
爱范儿
TaoSecurity Blog
TaoSecurity Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
I
Intezer
L
LINUX DO - 最新话题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Check Point Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
美团技术团队
Know Your Adversary
Know Your Adversary
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
PCI Perspectives
PCI Perspectives
月光博客
月光博客
T
Tailwind CSS Blog
Cloudbric
Cloudbric
小众软件
小众软件
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
K
Kaspersky official blog
D
DataBreaches.Net
博客园 - 【当耐特】
有赞技术团队
有赞技术团队

博客园 - 寒 刚入门

如何在IIS7或IIS7.5中导入导出站点及应用程序池. 编程点滴.验证信息本地化遇到的问题 编程点滴.技巧小记.处理"可疑"数据库 编程点滴.技巧小记.数据库链接 Razor 语法快速参考 [转]SQL,LINQ,Lambda语法对照图 编程点滴.LUCENE保存检索表达式遇到的问题 - 寒 刚入门 - 博客园 编程点滴.如何在VS2010中使用Regex Editor - 寒 刚入门 小玩意.包含全国省市区街道邮编的数据库 编程点滴.LUCENE.Luke查询工具 - 寒 刚入门 - 博客园 编程点滴.LUCENE.常用分词器 编程点滴.LUCENE的检索方式 编程点滴.LUCENE执行检索和分页 编程点滴.LUCENE.对数字、日期、时间等进行索引 - 寒 刚入门 - 博客园 编程点滴.LUCENE的FILED选项 jQuery1.3.2的选择器在IE8小[checked]失效的简单解决方法 - 寒 刚入门 我写的找重复数和过桥问题. CuteEditor6完整汉化包(更新到6.1) ASP.NET中设置CheckBox和RadioButton的默认值不可改变,并不丢失样式!
编程点滴.LUCENE高亮代码
寒 刚入门 · 2010-09-13 · via 博客园 - 寒 刚入门

我们使用搜索引擎(如谷歌,百度)都会在检索结果页高亮显示检索词.这种高亮显示很醒目,能够让我们迅速的关注到我们需要的信息上.

image

Lucene 的contrib已经包含了这样的功能模块

Highlighter

在检索结果中实现高亮的代码:

public void testHits() throws Exception {
IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());
TermQuery query = new TermQuery(new Term("title", "action"));
TopDocs hits = searcher.search(query, 10);
QueryScorer scorer = new QueryScorer(query, "title");
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(
new SimpleSpanFragmenter(scorer));
Analyzer analyzer = new SimpleAnalyzer();
for (int i = 0; i < hits.scoreDocs.length; i++) {
Document doc = searcher.doc(hits.scoreDocs[i].doc);
String title = doc.get("title");
TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[i].doc,
"title",
doc,
analyzer);
String fragment =
highlighter.getBestFragment(stream, title);
System.out.println(fragment);
}
}
//输出
//JUnit in <B>Action</B>
//Lucene in <B>Action</B>
//Tapestry in <B>Action</B>

FastVectorHighlighter

顾名思义,FastVectorHighlighter是一个快速的高亮工具,相对于Highlighter它有三个好处:

1.FastVectorHighlighter can support fields that are tokenized by n-gram tokenizers. Highlighter cannot support such fields very well.

2.FastVectorHighlighter 可以输出不同颜色的高亮.

3.FastVectorHighlighter 可以对词组高亮.(如检索lazy dog,FastVectorHighlighter<b>lazy dog</b>,而Highlighter则是<b>dog</b>)

image

FastVectorHighlighter代码:

public class FastVectorHighlighterSample {
static final String[] DOCS = { // #A
"the quick brown fox jumps over the lazy dog", // #A
"the quick gold fox jumped over the lazy black dog", // #A
"the quick fox jumps over the black dog", // #A
"the red fox jumped over the lazy dark gray dog" // #A
};
static final String QUERY = "quick OR fox OR \"lazy dog\"~1"; // #B
static final String F = "f";
static Directory dir = new RAMDirectory();
static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Usage: FastVectorHighlighterSample <filename>");
System.exit(-1);
}
makeIndex(); // #C
searchIndex(args[0]); // #D
}
static void makeIndex() throws IOException {
IndexWriter writer = new IndexWriter(dir, analyzer, true, MaxFieldLength.LIMITED);

for(String d : DOCS){
Document doc = new Document();
doc.add(new Field(F, d, Store.YES, Index.ANALYZED,
TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
}
writer.close();
}
static void searchIndex(String filename) throws Exception {
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
F, analyzer);
Query query = parser.parse(QUERY);
FastVectorHighlighter highlighter = getHighlighter(); // #F
FieldQuery fieldQuery = highlighter.getFieldQuery(query); // #G
IndexSearcher searcher = new IndexSearcher(dir);
TopDocs docs = searcher.search(query, 10);
FileWriter writer = new FileWriter(filename);
writer.write("<html>");
writer.write("<body>");
writer.write("<p>QUERY : " + QUERY + "</p>");
for(ScoreDoc scoreDoc : docs.scoreDocs) {
String snippet = highlighter.getBestFragment( // #H
fieldQuery, searcher.getIndexReader(), // #H
scoreDoc.doc, F, 100 ); // #H
if (snippet != null) { // #I
writer.write(scoreDoc.doc + " : " + snippet + "<br/>"); // #I
}
}
writer.write("</body></html>");
writer.close();
searcher.close();
}
static FastVectorHighlighter getHighlighter() {
FragListBuilder fragListBuilder = new SimpleFragListBuilder(); // #J
FragmentsBuilder fragmentBuilder = // #K
new ScoreOrderFragmentsBuilder( // #K
BaseFragmentsBuilder.COLORED_PRE_TAGS, // #K
BaseFragmentsBuilder.COLORED_POST_TAGS); // #K
return new FastVectorHighlighter(true, true, // #L
fragListBuilder, fragmentBuilder); // #L
}
}
#A 示例文档
#B 示例查询语句
#C 创建索引
#D 检索并打印结果
#E Store.YES 并且 TermVector.WITH_POSITIONS_OFFSETS
#F 获得一个 FastVectorHighlighter实例
#G 创建FieldQuery
#H 高亮片断
#I 打印高亮后片断
#J 创建 SimpleFragListBuilder
#K 创建多颜色标签 ScoreOrderFragmentsBuilder
#L 创建 FastVectorHighlighter 实例

LUCENE.NET QQ交流群(81361051)