Opened 10 years ago

Last modified 10 years ago

#10 new enhancement

Files backend needs locking

Reported by: sbocahu Owned by: vcaron
Priority: major Component: Bearmail
Keywords: Cc:

Description

the 'Files' beckend needs locking on read/write operations as many users could try to access it at the same time, and with different tools (upcoming bearmail cli, the webui, or file editor...).

so flock filehandles in bearmail cli and webui ! http://www.perlmonks.org/?node_id=7058

Change History (13)

comment:1 follow-up: Changed 10 years ago by vcaron

Needed for Bearmail demo (see #15 and #16), the typical case where race conditions will occur (Murphy+demo effect).

http://www.perlmonks.org/?node_id=7058

The article dates from 10 years ago. Be careful...

Looking up "file lock" on CPAN yields:

I should have we would love a "editor-aware" and "VCS-aware" locking. That would make the webgui extremely adminsys-friendly (but not u-f since it would report errors such as "Operation failed, the BOFH left its l33t Emacs open on the central configuration file and has left for the weekend. His phone is +33 666").

comment:2 in reply to: ↑ 1 Changed 10 years ago by sbocahu

http://www.perlmonks.org/?node_id=7058

The article dates from 10 years ago. Be careful...

I know, but there is no new syscalls / system wide locking mechanisms on our free unix operating systems. Therefor, perl/flock() is (AFAIK/Understand) still the only system-wide locking mechanism usable (trying BSD-inheritance flock(), fnctl() and lockf()).

Looking up "file lock" on CPAN yields:

Only useleful then using the File module instead of dealing with open/lock/close ourselves. Should we use it ?

As I understand it, it is another locking mechanism that is intended to be used on specific softwares only (perl scripts only - or same mechanism implemented in other languages).

The only advantage I see in this module compared to flock() is: it has the same behaviour everywhere, whereas flock() may call one of three syscalls and maybe don't support NFS. Should we support NFS ?

I should have we would love a "editor-aware" and "VCS-aware" locking. That would make the webgui extremely adminsys-friendly (but not u-f since it would report errors such as "Operation failed, the BOFH left its l33t Emacs open on the central configuration file and has left for the weekend. His phone is +33 666").

Many softwares don't respect advisory locks and have their own mechanism to prevent problems with file modification (vim: warning the file has been modified, etc) I think it won't be feasible...

Also, I think we should add an exclusive lock only when writing, and reread the file just before writing it, to prevent data loss between first-reading and writing.

Exclusive lock from reading to writing is not a good solution, as we are never sure that the user would finally decide to commit its changes, so that the lock may remain "forever"

comment:3 Changed 10 years ago by vcaron

I know, but there is no new syscalls / system wide locking mechanisms

Then either we use a module which implements several options and pick the "best" (very usual pattern with Perl modules), or we start with an obvious choice like fcntl which is in CORE (only works with seriouses Unix, but who cares about other OSes ?)

Many softwares don't respect advisory locks and have their own mechanism to prevent problems with file modification

Of course, and mandatory locking is a pain in the *ss as witnessed by Windows users.

I'm not concerned by other programs accessing Bearmail configuration files, none except from Bearmail itself is supposed to fiddle with /etc/bearmail configuration files. I'm just looking for some adminsys-friendly way, but a simple wrapper like "vimailmap" which spawns $EDITOR and does the proper locking would be fine too.

Also, I think we should add an exclusive lock only when writing, and reread the file just before writing it, to prevent data loss between first-reading and writing.

Good idea. And we should compute a hash and include it in any POST form to detect any under-the-hood modifications; if an error occur, it will be extremely helpful to explain the collision problem for the web user ("Can't proceed, someone changed that setting Nmin ago"). So maybe the hash should only be a timestamp.

comment:4 follow-up: Changed 10 years ago by vcaron

Some flock() support came in in [500].

Not related to locking, but to [500]: what's the point of adding %new_records and @records_to_delete in source:/bearmail/lib/BearMail/Backend/Files.pm ? I specificically avoided the action/diff model (compute what's needed to go frome one conf to the other) for KISS sake.

comment:5 in reply to: ↑ 4 Changed 10 years ago by sbocahu

Not related to locking, but to [500]: what's the point of adding %new_records and @records_to_delete in source:/bearmail/lib/BearMail/Backend/Files.pm ? I specificically avoided the action/diff model (compute what's needed to go frome one conf to the other) for KISS sake.

>> Also, I think we should add an exclusive lock only when writing, and reread the file just before writing it, to prevent data loss between first-reading and writing.

> Good idea

Should I reread and write the mailmap on each action (add_something or set_something: reread-add-write) ?

comment:6 follow-up: Changed 10 years ago by vcaron

I think only checking the mdate file attribute is sufficient:

  • store the mdate every time you read the mailmap
  • check it every time you write, fail if changed from what you stored (and bark with proper error message like: "File was modified by a non-locking friendly program since last parsing, won't merge")

It's a bit racy since there's still a small window before checking the mdate and actually writing the file, but it's meant as a safeguard against simple I-step-on-your-foot errors. The file backend is not meant to be perfectly scalable and is still very useful with that engineering glitch...

comment:7 in reply to: ↑ 6 Changed 10 years ago by sbocahu

Replying to vcaron:

I think only checking the mdate file attribute is sufficient:

  • store the mdate every time you read the mailmap
  • check it every time you write, fail if changed from what you stored (and bark with proper error message like: "File was modified by a non-locking friendly program since last parsing, won't merge")

Ok... but:

I'm facing a problem with a different behaviour between CGI and mod_perl: It looks like the backend object is stored when using mod_perl whereas a new backend object is recreated for each page view in CGI mode. (this is expected of course, this is why mod_perl is quicker I guess)

This way, in mod_perl, _read_mailmap() is called one time only, at backend object creation (as it is called in new()). So that the mtime variable won't change anymore, meaning a second operation ends in "mailmap has ben modified blahblah".

_read_mailmap() might be called a little more, then. But when ?

  • I feel fine about adding this in the setup() method of CGI::App
  • It could be called before each operation (del/add/set) either in the Backend (I don't like this solution) or in the UI controller.
  • or am I missing somethin' ?

It's a bit racy since there's still a small window before checking the mdate and actually writing the file, but it's meant as a safeguard against simple I-step-on-your-foot errors. The file backend is not meant to be perfectly scalable and is still very useful with that engineering glitch...

comment:8 Changed 10 years ago by zecrazytux

(In [511]) Check mtime before writting the mailmap, see #10

comment:9 follow-up: Changed 10 years ago by vcaron

mod_perl persistence should be put to good use. If we can avoid reaching the disk for mailmap at every incoming request, that's a bonus point.

I think it's easier to think as the life cycle of the backend object as a transaction: creating a backend object takes a snapshot of the backend state (in Files case, read+parse mailmap), then let you update it if nobody changed the state beforehand.

Therefore we should create a backend object for every incoming request. Let's first fix the bug with this change. You need to change most "my $..." package variables in BearMail::Backend::Files into objet attributes. Stash them in a hashref and bless it (and merge %args arg by arg instead of directly blessing it), in the usual Perl tradition.

But... we should be hable to hook with mod_perl optionnaly to cache the parsed mailmap. AFAIK simply keeping a ref in a package scoped "my $mailmap_cache" should do the trick (I hope there's no reason to deep-copy it). In a non-persistent env, this ref is simply never reused and costs nothing.

comment:10 Changed 10 years ago by zecrazytux

(In [531]) Turned Backend::Files into an object oriented module, see #10 and #15 This would fix the problem of consistency when used with mod_perl2.

comment:11 in reply to: ↑ 9 Changed 10 years ago by sbocahu

  • Owner changed from sbocahu to vcaron

Therefore we should create a backend object for every incoming request. Let's first fix the bug with this change. You need to change most "my $..." package variables in BearMail::Backend::Files into objet attributes. Stash them in a hashref and bless it (and merge %args arg by arg instead of directly blessing it), in the usual Perl tradition.

Thanks for the pointer, it used to drive me crazy ! (I though about it and tried to delete the backend object and recreate it at each request but I was missing the scopes :))

But... we should be hable to hook with mod_perl optionnaly to cache the parsed mailmap. AFAIK simply keeping a ref in a package scoped "my $mailmap_cache" should do the trick (I hope there's no reason to deep-copy it). In a non-persistent env, this ref is simply never reused and costs nothing.

I don't get it... Where am I supposed to keep that cache ? in wich package ? Do I need to manually check it at every request or is it "automatic" ?

comment:12 Changed 10 years ago by vcaron

I don't get it... Where am I supposed to keep that cache ?

In my $BearMail::Backend::Files::mailmap_cache (it's specific to the Files backend), a package global that will persist across requests within mod_perl.

Actually we should persist a copy of %records, as long as the mailmap file from where it was parsed as not changed. Pseudo code would look like:

get_mailmap {
  lock(mailmap, shared)
  mtime = stat(mailmap)
  if (not defined %records_cache or mtime > $records_cache_mtime) {
    this.records = parse_mailmap()
    records_cache = deep_copy(%records)
    records_cache_mtime = mtime
  } else {
    this.records = deep_copy(%records_cache)
  }
  unlock(mailmap)
}

I realize while writing it that there's at least a deep_copy penalty while storing the cached version, so ideally we should only use this cache algorithm in mod_perl context. Or just forget about it because we're not supposed to scale like hell with the Files backend. The mailmap file is obviously in kernel cache, and parsing it is maybe not costly at all.

We'll se later, and as the pseudo-code suggests, it can be implemented by wrapping _read_mailmap().

comment:13 Changed 10 years ago by vcaron

TODO;

  1. my %allowed should stay a package global, it's actually constant data. No need to copy it in every object.
  1. flock(MAILMAP, LOCK_NB); looks wrong in _read_mailmap():
    • it should explicitly ask for a shared lock here (LOCK_SH), I'm not sure if it's the default lock type
    • it should a/ block (remove LOCK_NB), b/ wait a short time (5sec timeout I'd say), c/ handle failure to acquire lock (just Carp)
Note: See TracTickets for help on using tickets.