[daisy] batch delete of many documents
Geert Coelmont
2013-11-15 21:18:43 UTC
Hi all,

We are using Daisy as a document archival system, and have stored over a
million documents in several Daisy instances.
We are now about to delete the oldest batch of stored documents.

I am familiar with the HTTP repository API, which we would use to delete
documents below a certain document-ID.
Question: as each of our documents also has a PDF attachment part, is
the HTTP DELETE method also going to delete the attachment parts on the
documents? Or do we need to explicitly DELETE the individual parts?

As an alternative we are looking at working directly on the MySQL database.
I understand this is risky, but it would likely be 100s of times faster,
if not 1000s.
This would involve finding the relevant parts in the tables, delete the
corresponding files in the filesystem structure, and delete the relevant
rows (documents, variants, parts, ...)
Question: has anybody done this before, and do you perhaps have example
scripts and/or other insight you can share about this?

Thanks in advance,

All e-mail messages addressed to, received or sent by the Cobelfret Group or Cobelfret Group employees are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other Cobelfret Group employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof.

This mail has been checked for viruses by Sophos