solr 自定义 collector 尝试

这是 IndexSearcher.search 的代码

  protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
      throws IOException {

    // TODO: should we make this
    // threaded…?  the Collector could be sync'd?
    // always use single thread:
    for (LeafReaderContext ctx : leaves) { // search each subreader
      final LeafCollector leafCollector;
      try {
        leafCollector = collector.getLeafCollector(ctx);
      } catch (CollectionTerminatedException e) {
        // there is no doc of interest in this reader context
        // continue with the following leaf
        continue;
      }
      BulkScorer scorer = weight.bulkScorer(ctx);
      if (scorer != null) {
        try {
          scorer.score(leafCollector, ctx.reader().getLiveDocs());
        } catch (CollectionTerminatedException e) {
          // collection was terminated prematurely
          // continue with the following leaf
        }
      }
    }
  }
  
  经过尝试,发现载入 DocValues 的代码是这句 BulkScorer scorer = weight.bulkScorer(ctx);
  
  则说明其实每一个 LeafReaderContext ctx 只需要载入一次 DocValues 就行
  
  之前我直接修改了 TFIDFSimilarity ,每个文档打分都要载入 DocValues,严重降低了效率

在这里可以重写一个 TopScoreDocCollector,在调用 leafCollector = collector.getLeafCollector(ctx); 这句话的时候,把所有需要的 DocValues 先一次性载入,就行

发表评论

电子邮件地址不会被公开。 必填项已用*标注


*