notmuch support #264

Closed
opened 2019-12-02 00:30:46 +00:00 by anarcat · 8 comments

What I'm trying to do and why

One of the things that's creating the most "churn" (as in, needless changes and transfers) in my borg backups is notmuch mail. It's a great piece of software, but one of its ... peculiarities is it can create pretty big databases. Before "compaction" here, the (Xapian) database was 12GB. But after compaction, it drops down to 2GB, which is still pretty big.

That database can also change during backup, obviously, which is bad news for a consistent restore.

I would like to backup this thing properly. From what I understand, the "proper" way of doing a backup of a notmuch database is to create a text only copy of the database, with notmuch dump and copy the related Maildir/ spool (which is indexed in the huge database). In my home-made scripts, I do basically this:

notmuch dump | pv > Maildir/.notmuch/xapian/notmuch.dump
borg create -e 'Maildir/.notmuch/xapian/*.glass'
rm Maildir/.notmuch/xapian/notmuch.dump

That's a simplified view, of course. Is this something I should implement directly in borgmatic, or should I use some existing pre/post hook system instead?

Thanks!

#### What I'm trying to do and why One of the things that's creating the most "churn" (as in, needless changes and transfers) in my borg backups is [notmuch mail](https://notmuchmail.org/). It's a great piece of software, but one of its ... peculiarities is it can create pretty big databases. Before "compaction" here, the (Xapian) database was 12GB. But after compaction, it drops down to 2GB, which is still pretty big. That database can also change during backup, obviously, which is bad news for a consistent restore. I would like to backup this thing properly. From what I understand, the "proper" way of doing a backup of a notmuch database is to create a text only copy of the database, with `notmuch dump` and copy the related `Maildir/` spool (which is indexed in the huge database). In my home-made scripts, I do basically this: ``` notmuch dump | pv > Maildir/.notmuch/xapian/notmuch.dump borg create -e 'Maildir/.notmuch/xapian/*.glass' rm Maildir/.notmuch/xapian/notmuch.dump ``` That's a simplified view, of course. Is this something I should implement directly in borgmatic, or should I use some existing pre/post hook system instead? Thanks!
Owner

I'd certainly consider support for this directly in borgmatic, especially since I think it'd be pretty straight-forward to add. Let me know if you'd like to take a crack at implementing this yourself!

I'd certainly consider support for this directly in borgmatic, especially since I think it'd be pretty straight-forward to add. Let me know if you'd like to take a crack at implementing this yourself!
Author

i would love some guidance on how to do this. an example commit adding similar functionality would greatly help, along with some pointers on where the code should be added, how to write unit tests (if that's necessary) and so on... thanks!

i would love some guidance on how to do this. an example commit adding similar functionality would greatly help, along with some pointers on where the code should be added, how to write unit tests (if that's necessary) and so on... thanks!
Owner

Sure thing! Here's an overview of the entry points and call flow:

Hopefully, you shouldn't need to change any of that. But the context should be helpful.

In terms of what you'd need to add:

  1. Create configuration schema that allows a user to define a notmuch database. Here's an example for the current PostgreSQL configuration schema. You can try running generate-borgmatic-config to try out your schema changes and make sure they get rendered as you expect. Also add the hook name (e.g. notmuch_databases) to the list of database hook names.
  2. You can create a borgmatic/hooks/notmuch.py source file and implement each of dump_databases(), remove_database_dumps(), make_database_dump_patterns(), and restore_database_dumps(). By way of example, here's the existing module for MySQL. You'll notice that for some functionality, it makes use of a common dump.py file of utility functions common to multiple database types. Also, don't forget to add your newly created module to the mapping of configuration hook name to Python module name.
  3. Add unit tests for each function and each code path therein. I'd create a test_notmuch.py in that directory. borgmatic uses flexmock as its test framework.
  4. Somewhere along the way, do some manual testing against your own notmuch databases! The borgmatic development documentation should be helpful here. borgmatic create and borgmatic restore should all work as expected. Including error cases where dumps aren't present, etc.
  5. Add documentation. This doesn't have to be extensive. Look at the existing database docs for an example.

I know this sounds like a lot, but I think (hope!) that it's really pretty straight-forward. And of course let me know if anything here isn't straight-forward, or you need some help. WIP PRs are welcome.

Sure thing! Here's an overview of the entry points and call flow: * The borgmatic command [triggers database dumps](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/commands/borgmatic.py#L73) for all database hooks (PostgreSQL, MySQL, etc.). This effectively calls `dump_database()` in each of the per-database [source](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/postgresql.py) [files](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/mysql.py). * Similiarly, the borgmatic command [triggers database temporary dump file cleanup](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/commands/borgmatic.py#L110) for all database hooks. This effectively calls `remove_database_dumps()` on the per-database source code. * The command also [makes database dump patterns](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/commands/borgmatic.py#L285) and [restores database dumps](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/commands/borgmatic.py#L316) when the `borgmatic restore` command is run explicitly. This effectively calls `make_database_dump_patterns()` and `restore_database_dumps()` on the per-database source code. Hopefully, you shouldn't need to change any of that. But the context should be helpful. In terms of what you'd need to add: 1. Create configuration schema that allows a user to define a notmuch database. Here's an example for the current [PostgreSQL configuration schema](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/config/schema.yaml#L381). You can try running `generate-borgmatic-config` to try out your schema changes and make sure they get rendered as you expect. Also add the hook name (e.g. `notmuch_databases`) to the [list of database hook names](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/dump.py#L7). 2. You can create a `borgmatic/hooks/notmuch.py` source file and implement each of `dump_databases()`, `remove_database_dumps()`, `make_database_dump_patterns()`, and `restore_database_dumps()`. By way of example, here's the existing [module for MySQL](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/mysql.py). You'll notice that for some functionality, it makes use of a common [dump.py](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/dump.py) file of utility functions common to multiple database types. Also, don't forget to add your newly created module to the [mapping of configuration hook name to Python module name](https://projects.torsion.org/witten/borgmatic/src/branch/master/borgmatic/hooks/dispatch.py#L7). 3. Add [unit tests](https://projects.torsion.org/witten/borgmatic/src/branch/master/tests/unit/hooks/test_mysql.py) for each function and each code path therein. I'd create a `test_notmuch.py` in that directory. borgmatic uses [flexmock](https://flexmock.readthedocs.io/en/latest/) as its test framework. 4. Somewhere along the way, do some manual testing against your own notmuch databases! The [borgmatic development documentation](https://torsion.org/borgmatic/docs/how-to/develop-on-borgmatic/) should be helpful here. `borgmatic create` and `borgmatic restore` should all work as expected. Including error cases where dumps aren't present, etc. 5. Add documentation. This doesn't have to be extensive. Look at the [existing database docs](https://torsion.org/borgmatic/docs/how-to/backup-your-databases/) for an example. I know this sounds like a lot, but I think (hope!) that it's really pretty straight-forward. And of course let me know if anything here isn't straight-forward, or you need some help. WIP PRs are welcome.
Author

your reply sounds like a great addition to the "reference" section. ;) and yes, it does seem like a lot, but I guess that's the price to pay for inclusion, and I respect that! i'll see what i can do.

thanks for the detailed reply!

your reply sounds like a great addition to the "reference" section. ;) and yes, it does seem like a lot, but I guess that's the price to pay for inclusion, and I respect that! i'll see what i can do. thanks for the detailed reply!
Owner

I'll see if I can distill something down for reference or the development guide.

If it turns out that you only get part-way through this, that's totally fine. I'm happy to pick up from there. I'm just happy to get contributions!

I'll see if I can distill something down for reference or the development guide. If it turns out that you only get part-way through this, that's totally fine. I'm happy to pick up from there. I'm just happy to get contributions!
Author

minimal progress here: I've hooked notmuch into the pre/post hooks system, like this:

hooks:
    before_backup:
        - echo "creating notmuch dump file"
        - notmuch --config=/home/anarcat/.notmuch-config dump --output=/home/anarcat/Maildir/.notmuch/xapian/notmuch.dump

    after_backup:
        - echo "removing notmuch dump file"
        - rm /home/anarcat/Maildir/.notmuch/xapian/notmuch.dump

I was hoping that pv would work here, but alas, it seems the hook output gets swallowed somehow and doesn't show up, which is strange because the echo do show up. maybe standard error is hidden in hooks?

I'm not sure I'll be able to implement a native plugin for notmuch. It is kind of a corner case and maybe it's better to use this simple hook instead...

it would sure be nice to have better integration with the borg flags... like right now the above always shows the "echo" output, regardless of verbosity... same with the pv progress bar which does not get shown (yet should show up with --progress). :) but i guess that's a minor tradeoff compared with the work involved in coding all of this...

sorry, i wish i had more time, but i was able to scratch that itch without going through anything complicated, so maybe that's for the best. :)

minimal progress here: I've hooked notmuch into the pre/post `hooks` system, like this: ``` hooks: before_backup: - echo "creating notmuch dump file" - notmuch --config=/home/anarcat/.notmuch-config dump --output=/home/anarcat/Maildir/.notmuch/xapian/notmuch.dump after_backup: - echo "removing notmuch dump file" - rm /home/anarcat/Maildir/.notmuch/xapian/notmuch.dump ``` I was hoping that `pv` would work here, but alas, it seems the hook output gets swallowed somehow and doesn't show up, which is strange because the `echo` do show up. maybe standard error is hidden in hooks? I'm not sure I'll be able to implement a native plugin for notmuch. It is kind of a corner case and maybe it's better to use this simple hook instead... it would sure be nice to have better integration with the borg flags... like right now the above always shows the "echo" output, regardless of verbosity... same with the `pv` progress bar which does *not* get shown (yet should show up with `--progress`). :) but i guess that's a minor tradeoff compared with the work involved in coding all of this... sorry, i wish i had more time, but i was able to scratch that itch without going through anything complicated, so maybe that's for the best. :)
Owner

I'm glad to hear that the command hooks are (mostly) working for you here. Custom preparation/cleanup commands are certainly what they're there for!

As for the echo showing up regardless of verbosity, that's by design. With the exception of error hooks, all hook output is logged at a level such that it shows up all at all verbosities except -1. The idea is that if you're running custom commands, you probably don't want their output swallowed. So you could switch to --verbosity -1 if you really don't want to see it. Or, you know, remove the echo. 😄

And I'm pretty sure that the pv progress bar doesn't work because borgmatic captures the hook command output and therefore doesn't give commands like pv access to an interactive terminal where they can redraw the display. The purpose behind capturing output is so that it can flow properly to logs (whether console, syslog, or file).

Of course, there'd be more flexibility to change how this works with a "native" hook.. But I totally recognize the trade-off there in up-front effort.

I'm glad to hear that the command hooks are (mostly) working for you here. Custom preparation/cleanup commands are certainly what they're there for! As for the `echo` showing up regardless of verbosity, that's by design. With the exception of error hooks, all hook output is logged at a level such that it shows up all at all verbosities except `-1`. The idea is that if you're running custom commands, you probably don't want their output swallowed. So you could switch to `--verbosity -1` if you really don't want to see it. Or, you know, remove the `echo`. :smile: And I'm pretty sure that the `pv` progress bar doesn't work because borgmatic captures the hook command output and therefore doesn't give commands like `pv` access to an interactive terminal where they can redraw the display. The purpose behind capturing output is so that it can flow properly to logs (whether console, syslog, or file). Of course, there'd be more flexibility to change how this works with a "native" hook.. But I totally recognize the trade-off there in up-front effort.
Author

that all makes sense! :) thank you in any case for all your help and i hope that your work of documenting how to possibly do this will be useful for future people looking into this problem, for notmuch or else...

that all makes sense! :) thank you in any case for all your help and i hope that your work of documenting how to possibly do this will be useful for future people looking into this problem, for notmuch or else...
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#264
No description provided.