Geert Coelmont
2013-12-10 18:22:29 UTC
Hi all,
After a massive cleanup of old documents, we want to do a complete
reindex of all remaining documents in our Daisy repository.
For this, we used the JMX management console, FullTextIndexUpdater,
function reIndexAllDocuments.
This starts OK and for several minutes the "Reindex Status" is "Querying
the repository to retrieve the list of documents to re-index (started at
....)"
After a while, we get the following in the logs:
[ERROR ] <2013-12-10 11:59:06,886>
(org.outerj.daisy.ftindex.FullTextIndexImpl): Error updating fulltext
index writer.
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.store.IndexInput.readString(IndexInput.java:92)
at
org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:216)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:124)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:333)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:207)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1883)
at
org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:1811)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1742)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1733)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:874)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.closeIndexWriter(FullTextIndexImpl.java:330)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.updateWriter(FullTextIndexImpl.java:317)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.access$300(FullTextIndexImpl.java:54)
at
org.outerj.daisy.ftindex.FullTextIndexImpl$IndexFlusher.run(FullTextIndexImpl.java:370)
at java.lang.Thread.run(Thread.java:662)
We do have a lot of documents in the repository (about 400K).
It would seem that the actual reindexing didn't start yet, and Daisy was
still "collecting" the necessary documents.
Is there anything apart from increasing memory that we can do to make
this work more efficiently.
It would already be a help if we could reindex only the most recent
document (e.g. last few thousand).
Thanks in advance,
--
Best regards / Met vriendelijke groeten
*Geert Coelmont*
Headbird
Headbird NV -- ICT services | Sneeuwbeslaan 14 --2610 Antwerpen (BE)
+32 3 829 9047 | ***@headbird.com
**********************************************************************
All e-mail messages addressed to, received or sent by the Cobelfret Group or Cobelfret Group employees are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other Cobelfret Group employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof.
This mail has been checked for viruses by Sophos
*********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cocoondev.org/pipermail/daisy/attachments/20131210/cd737d73/attachment.htm
After a massive cleanup of old documents, we want to do a complete
reindex of all remaining documents in our Daisy repository.
For this, we used the JMX management console, FullTextIndexUpdater,
function reIndexAllDocuments.
This starts OK and for several minutes the "Reindex Status" is "Querying
the repository to retrieve the list of documents to re-index (started at
....)"
After a while, we get the following in the logs:
[ERROR ] <2013-12-10 11:59:06,886>
(org.outerj.daisy.ftindex.FullTextIndexImpl): Error updating fulltext
index writer.
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.store.IndexInput.readString(IndexInput.java:92)
at
org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:216)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:124)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:333)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:207)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1883)
at
org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:1811)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1742)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1733)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:874)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.closeIndexWriter(FullTextIndexImpl.java:330)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.updateWriter(FullTextIndexImpl.java:317)
at
org.outerj.daisy.ftindex.FullTextIndexImpl.access$300(FullTextIndexImpl.java:54)
at
org.outerj.daisy.ftindex.FullTextIndexImpl$IndexFlusher.run(FullTextIndexImpl.java:370)
at java.lang.Thread.run(Thread.java:662)
We do have a lot of documents in the repository (about 400K).
It would seem that the actual reindexing didn't start yet, and Daisy was
still "collecting" the necessary documents.
Is there anything apart from increasing memory that we can do to make
this work more efficiently.
It would already be a help if we could reindex only the most recent
document (e.g. last few thousand).
Thanks in advance,
--
Best regards / Met vriendelijke groeten
*Geert Coelmont*
Headbird
Headbird NV -- ICT services | Sneeuwbeslaan 14 --2610 Antwerpen (BE)
+32 3 829 9047 | ***@headbird.com
**********************************************************************
All e-mail messages addressed to, received or sent by the Cobelfret Group or Cobelfret Group employees are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other Cobelfret Group employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof.
This mail has been checked for viruses by Sophos
*********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cocoondev.org/pipermail/daisy/attachments/20131210/cd737d73/attachment.htm