wiki:DebianLamp

Version 42 (modified by Vincent Caron, 11 years ago) (diff)

--

Notes on a LAMP5 setup on Debian Etch

  1. MySQL
    1. Installation
    2. Configuration
  2. Apache
    1. fcgid
    2. Global configuration
    3. Virtual host configuration
    4. Limits
  3. PHP
    1. Installation
    2. Configuration
    3. Module or FastCGI ?
  4. Testing

This is a documentation with notes and hints retrieved from the experience of Bearstech while hosting different applications on various infrastructures, from a simple shared hosting server to a 12-server cluster. We tried to keep the most common and interesting bits that should help in every situation.

By LAMP5 we mean the use of PHP 5.x and MySQL 5.x technologies. Most importantly, we only focus on Debian GNU/Linux 4.0 (Etch) packages. Namely:

  • MySQL 5.0.32
  • PHP 5.2.0
  • Apache 2.2.3

See also: http://www.seaoffire.net/fcgi-faq.html

Todo:

  • Notify that php_* Apache directives can't be used with the FastCGI model
  • Add suExec sample
  • Better documentation of fcgid tuning, especially max process control
  • Update wrt Debian Lenny

MySQL

In most cases you'll want MySQL >= 4.1. This version brought significant goodies like ful-text indexing, sub-queries and "... ON DUPLICATE KEY UPDATE ...". MySQL 5.0 brought major advances in clustering, and of course lots of improvements and bugfixes.

Installation

# aptitude install mysql-client-5.0 mysql-server-5.0
# mysqladmin password <gory password here>
# cat >~/.my.cnf
[client]
user = root
pass = <gory password here>
<Ctrl+D>
# chmod 600 ~/.my.cnf

The Debian MySQL server package sets an empty MySQL root password as a default. We set one and store it in the prefs. The password is actually readable via:

  • the ps command for a very short time, but there is no interactive version of mysqladmin password
  • in the shell history (~/.bash_history unless you run unset HISTFILE before logging out, but don't bother)
  • obviously in ~/.my.cnf

Storing the MySQL root password as clear text on the server itself is not a security weakness, or at least far from the most worrying one. If your server is compromised at the root level, then your ~/.my.cnf is not even necessary for an attacker to own all your MySQL databases. And it is quite easy to retrieve all database credentials from the web application configuration files (often even from a non privileged account).

Configuration

Most default values are very sane in Debian's /etc/mysql/my.cnf. As for various caches sizing, keep the default for a start unless you really know what you are doing. You will have more hindsight about the proper tuning when your application has run for a while (a famous tool is tuning-primer.sh).

A few useful hints:

  • max_connections: PHP is spooling MySQL connections per database and process. If you have 5 applications with their own database and at most 20 PHP processes, the MySQL server should be able to handle up to 100 concurrent connections (which is the default). If you hit this limit, application's mysql_connect() calls should raise a proper error, watch your logs.
  • table_cache: have it roughly equal to the total number of tables across all your databases.
  • query_cache_limit, query_cache_size: although this cache is the simplest and most efficient optimisation, refrain from setting big numbers right away. This cache is filled with lots of small requests in most cases, and 16MB might be far enough. Wait for some production time and check the cache hit statistics with show status like 'qcache%'.

You will want to activate the "slow query log" though, especially on production servers. It helps to catch queries that were not properly indexed or that may be the most expensive:

log_slow_queries   = /var/log/mysql/mysql-slow.log
long_query_time    = 2
log-queries-not-using-indexes

And if you don't need MySQL replication, turn off the binlogs (it's on by default). That will spare you quite some disk space and I/O. Simply comment out the binlog related directives.

#log_bin           = /var/log/mysql/mysql-bin.log
#expire_logs_days  = 5

Don't forget to restart your MySQL server to take those settings into account.

Apache

Here we install Apache in its "worker" (aka threaded) implementation, and a FastCGI backend module. We don't document the traditionnal way of using the "prefork" Apache model with the PHP module (see DebianLamp#ModuleorFastCGI).

# aptitude install apache2-mpm-worker libapache2-mod-fcgid
# /etc/init.d/apache2 force-reload

fcgid

The fcgid module is a bit sloppy. It has minimalistic documentation, poorly chosen option names (they sometimes look like core Apache directives), and Debian Etch packages a quite old version (1.10 vs. current 2.2). Fortunately, it really works like a charm and needs roughly no configuration.

You should comment the AddHandler fcgid-script .fcgi directive in /etc/apache2/mods-available/fcgid.conf, the other ones being fine (namely SocketPath and IPCConnectTimeout). The snippet to make PHP works through FastCGI for a given application looks like:

<Directory /var/www/myapp>
  Options +ExecCGI
  AddHandler fcgid-script .php
  FCGIWrapper /usr/local/bin/php5-fastcgi-wrapper .php
</Directory>

Since fcgid needs the ExecCGI option set but you certainly don't want to handle any actual CGI, you will have to make sure there is no other handler active in the same context (RemoveHandler may help you). Debian does not activate the common AddHandler cgi-script .cgi as a default, with a pristine configuration you are safe.

The wrapper is only needed to set two environment variables for PHP which can not be passed by other means (like SetEnv and DefaultInitEnv). And these variables must be set:

#!/bin/sh

export PHP_FCGI_CHILDREN=0
export PHP_FCGI_MAX_REQUESTS=10000
exec /usr/bin/php5-cgi
  • PHP_FCGI_CHILDREN=0 makes sure PHP does not handle the process spawning, Apache+fcgid being much better at this
  • PHP_FCGI_MAX_REQUESTS=10000 makes sure a PHP process can run more than the default 500 requests in a row, or we have a painful situation which can only be properly fixed with fcgid 1.11 (see MaxRequestsPerProcess).

fcgid is very good at maintaining the proper pool of PHP CGI processes. Chances are that the only parameter you will want to tune is the MaxProcessCount to set an upper limit. A good rule of thumb is to make sure that your php.ini's memory_limit multiplied by the maximum number of PHP processes fit in available (non-virtual) memory.

Keep your habit of running /etc/init.d/apache2 reload when you change a PHP setting: fcgid will ask all PHP CGI processes to end their current processing request and terminate. Then newly spawned one will use the new settings. This is why a reload can take some time (30 seconds is not uncommon).

Notes:

  • Another backend exists in the non-free branch (libapache2-mod-fastcgi) but it is quite tedious to configure properly and the author didn't manage to make it work as reliably as fcgid under heavy load.
  • Upcoming Apache 2.3/2.4 is implementing a FastCGI backend as part of its proxy framework, just as Lighttpd is actively doing in the 1.5 branch. This should be great news for a more mainstream and better supported FastCGI support.

Global configuration

You may want to tune the worker MPM first. As usual, the MaxClients is the maximum number of simultaneous HTTP connections you may handle. The worker MPM achieve this result by using one thread per connection, a bunch of threads per process and at most 16 processes. The trick is that MaxClients must be a multiple of ThreadsPerChild, because Apache computes MaxClients / ThreadsPerChild to get the number of processes to start.

Apache has a safeguard limiting to 16 processes and 64 threads per processes but you can raise those limits with the proper directives. We can handle 1024 simultaneous connections with this sample configuration:

# /etc/apache2/apache2.conf
<IfModule mpm_worker_module>
    MaxClients           1024
    ThreadsPerChild        64
    MaxRequestsPerChild  1000
</IfModule>

A few other interesting items:

  • The default connection Timeout could be lower than the default 300sec if you have a busy server. Also think that a very simple attack (sending hanging queries) can use all your connection pool quickly. And most people just won't wait more than 30 second for a page to load. Your mileage may probably vary between 30 and 60 seconds.
  • Use KeepAlive, but with a short timeout: this is beneficial for many clients (less RTT due to less TCP setup/handshakes), and is not worse for the server connection pool availability than turning it off. The server-side TCP establishment overhead is not an important figure nowadays. Ideally, you know how many hits (images, CSS, JS) are needed at most to build your application heaviest page and set MaxKeepAliveRequests to it. Eg:
    KeepAlive On
    MaxKeepAliveRequests 50
    KeepAliveTimeout 2
    
  • Maybe have a less verbose Server: header in your HTTP replies. If you have an up-to-date Debian, you could be proud to display all of your nice version numbers with the latest security fix, but most people prefer hiding the details. ServerTokens Minor should suit most people.
  • If your application is deployed and maintained via a VCS, don't publish the bookkeeping files:
    <Files ~ "^\.(svn|cvs)">
        Order allow,deny
        Deny from all
    </Files>
    

Virtual host configuration

Here is a very simple but actual example, used and tested in heavy-weight sites. It's a named-based virtual host which handles exactly one hostname:

# Setup PHP as FastCGI for this application
#
<Directory /var/www/myapp>
  Options +ExecCGI
  AllowOverride None

  AddHandler fcgid-script .php
  FCGIWrapper /usr/local/bin/php5-fastcgi-wrapper .php
</Directory>

# All of these names reach our application via a redirection
#
<VirtualHost *>
  ServerName  myapp.com
  ServerAlias myapp.org www.myapp.org myapp.net www.myapp.net

  RedirectMatch (.*) http://www.myapp.com$1
</VirtualHost>

# Our application canonical vhost
#
<VirtualHost *>
  ServerName www.myapp.com

  DocumentRoot /var/www/myapp/public

  AddOutputFilterByType DEFLATE text/html text/css application/x-javascript
  Header append Vary User-Agent env=!dont-vary
  BrowserMatch "\bMSIE 6" !no-gzip !gzip-only-text/html

  ErrorLog  /var/log/apache2/myapp/error.log
  CustomLog /var/log/apache2/myapp/access.log combined
</VirtualHost>

The <Directory> scoped was already explained. The <VirtualHost> one uses a couple of very simple ideas:

  • Use output compression. This is very efficient and transparent, and in most cases reduces the page loading latency (as perceived by client) a lot - while saving some bandwidth, not necessarily the most important effect. We use the standard tricks from the Apache documentation, MSIE 6.x having serious (unfixed) bugs with CSS and JS compression. You will need to run a2enmod deflate; a2enmod headers; /etc/init.d/apache2 force-reload. If your PHP configuration or your application tries to use PHP's own gzip output handler (zlib.output_compression = On or output_handler = ob_gzhandler), turn it off. Apache is much more consistent at this job, and it will handle any request, static or dynamic.
  • Use one log folder per application. Sysadmins and many analyzer/stats tools will prefer it if you have distinct logs. The important trick is to fix the logrotate invocation in /etc/logrotate.d/apache2, eg.: /var/log/apache2/*.log var/log/apache2/*/*.log {.... You still end with a "global error file" /var/log/apache2/error.log (defined in /etc/apache2/apache2.conf) which holds all errors and notices which do not belong to a known virtual host context. This is the actual error.log you'll want to watch night and day as a sysdamin, the other (per-vhost) ones being useful for the application maintainers.
  • Use one servername in your application virtual host (that would be called your application "canonical name"). All other names should be redirections. The rationale is mainly about sessions and making sure they all end in a unique domain, about SEO and good hosting practices. Try to catch all alternate names for your application. If Apache does not find a matching ServerName or ServerAlias entry for a given request, it will fallback to the first defined virtual host. This is why the redirections are set up first in our examples.

Limits

All processes have a default limit of 1024 file handles (see /etc/security/limits.conf). Although Apache (driver) is a root process and could opt to raise this barrier, it relies on you doing it. Modify your /etc/default/apache2 file as:

NO_START=0
ulimit -n 64000

See you favourite kernel documentation about a per-connection memory cost, most of the time it's negligible, even with a firewall. As a very rough estimation to help you choose your hardware, Etch AMD64's Apache 2.2 with the default module set has been reported to "cost as a whole" 100MB for 1000 concurrent connections (see this bench).

PHP

Installation

You have the choice between the module (libapache2-mod-php5) and the CGI/FastCGI flavour (php5-cgi). Here we focus on the FastCGI setup, see DebianLamp#ModuleorFastCGI for an explanation.

It is a good idea to always install the PHP backend (here -cgi) and the -cli mode together, as most applications require some cronjobs or other background tasks to run indepently from the HTTP context. Keep the -cgi and -cli conf identical by symlinking them. There should be no compelling reason to have different configurations for inline and offline operations.

At a minimum we install the php5-mysql extension (php5-mysqli being an even better idea, see below).

# aptitude install php5-cgi php5-cli php5-mysql
# ln -sf ../cgi/php.ini /etc/php5/cli/php.ini

Configuration

Reference configuration: source:/lamp/php-cgi.ini

Have a good configuration file in /etc/php5/cgi/php.ini. Outstanding params:

  • display_errors: this must be always off. Even in a test/dev context, it should always be off. It's totally un-practical to have your output munged with random debug statements, and the error.log is there for some purpose. Either you have an application running ini_set('display_errors', 'on') and you can bash the developer, either you have a sane framework which handles frontend debug output by itself and properly. You can also use the Apache directive 'php_value error_log /var/www/app/php-error.log' if you like to keep Apache's error.log consecrated to 404 and 500's.
  • log_errors: this must always be on. See above.
  • memory_limit: should be low (4M or 8M is nice), and only upgraded with ini_set('memory_limit', $bytes) in very specific scripts (or not at all). Adminsys may also raise the limit within any Apache scope (<Directory>, <Location>, .htaccess) using the directive php_value memory_limit 16M. Or better: use the alternative php_admin_value so that the application may not override your setting (obviously won't work in .htaccess).
  • register_globals: this must be off. Application which rely on this setting to be on are very old or totally insecure, most of them both.
  • post_max_size, upload_max_filesize: are better kept in sync (effective upload limit is the minimum of these two values), any value will do. It's okay to have 100M here.
  • upload_tmp_dir: should be in the partition where data will end, sane default is often /var/tmp (instead of /tmp)
  • allow_url_fopen should be off (if turned on, a simple directory traversal attack turns into a remote exploit). This can be mitigated by being activated and setting allow_url_include = off, nonetheless CURL is a far better match if the application needs to be a FTP/HTTP client (safer, proper error handling, good documentation).
  • session.gc_maxlifetime: the default session expiration, often left as is by many applications. You'll want to have something like 30 days (2592000 seconds) if you know your application is quite sane with session handling (eg. file-based session storage with +1000 new sessions/day is quite a limit). With Debian, you especially need to know what /etc/cron.d/php5 is for.

Notes on some PHP modules:

  • If your application prefers mysqli over mysql, go for it. Your best hope is that it will use parameter bindings (safer) and prepared statements (faster).
  • If your application only transforms image or modify them in a trival way (say adding some watermark), don't use gd, use ImageMagick as an external program (NOT the module). gd is bad at image transformation (poor algorithms, memory hungry/no streaming process), and it is not safe nor efficient to have the image data in PHP memory space.

Module or FastCGI ?

PHP is traditionnally run as an Apache module. It has some advantages, such as benefiting from the process spawning/handling of Apache (which is efficient, predictible, easy to tune and very resilient). In other words, it has no maintenance overhead.

On the other hand, it may bear some criticisms:

  • Lack of isolation: PHP applications run in Apache's runtime context and they share the same credentials (namely all run with www-data user privilege). Although there are some solutions with suExec (alone) and suPHP, those are not mainstream and have problems of their own.
  • Shallow apache embedding: PHP as a module barely exposes an Apache API. Other languages and frameworks will provide much richer and sophisticated usage of Apache ressources. PHP and its developers don't use that: not a big deal, but severely undermine the reasons to live as an Apache module.

The lack of isolation can be detailed with several consequences:

  • Security: we would obviously like to run every application with its own Unix account, maybe in its own chroot. And with the PHP module, several applications may (and will) end in the same process context.
  • Stability: for the same reason quoted above, applications wich relies on some global process behaviour might step on each other (a painful bug about mbstring.func_overload comes to mind)
  • Resilience: if PHP crashes, an Apache process crashes. Fortunately not a real issue since an Apache prefork process handles one request at a time and will be respawned as necessary.
  • Scalability: since PHP is not thread-safe (to be correct: many PHP modules are not), it forces the prefork model on Apache. This does not scale well since it limits the number of simultaneous HTTP connections to MaxClients, which is itself bounded by memory constraints (and a default Apache internal limit of 256).

Now that the C10K problem has been identified and tackled in all modern kernels and HTTP servers, it's time to move forward. For Apache, it means switching to the MPM worker. And trhowing PHP out of Apache's nest.

FastCGI comes as a natural answer to this solution: it has been specifically designed to interface HTTP servers and web applications with a very simple protocol which preserve most if not all semantics of the famous CGI 1.1 specs. You get all benefits:

  • Apache does the HTTP job and the static serving, lives and scales independently of your web applications
  • Every application may live in its own context (own process, own Unix account, own chroot, own server, etc.) and have its own ressource set (number of processes, memory, etc.)

Testing

This is a very rough and unfinished chapter, but it might be useful as a quick reminder of basic things to check before going "live".

  • Check the virtual host setup:
URL Result
http://www.myapp.com/ Should "work", obviously
http://myapp.com/aaa Redirects to http://www.myapp.com/aaa
http://11.22.33.44/bbb Redirects to http://www.myapp.com/bbb (use your server IP address)
  • Check the output compression:
    $ wget -S --header='Accept-Encoding: gzip' -O - http://www.myapp.com/ >/dev/null
    ...
      Content-Encoding: gzip
    ...
    
  • Check the peak static throughput. You should get a number which is bound by your physical network with a remote test (more exactly the slower link between your server and your test machine), and bound by your CPU when run locally (I/O are not relevant since the static file is in memory cache after the first request).
    $ /usr/sbin/ab -c1 -n1000 http://www.myapp.com/myapp.css
    
  • Check the "parallel" static throughput. You should have the same total throughput than with the peak measures above (in other words, "it scales"), maybe slightly lower with a lot of concurrent connections (note: it's hard to run ab below 5000 simulated clients). If you have errors, check that you didn't forgot to raise your kernel's default file handles limit.
    $ /usr/sbin/ab -c2000 -n20000 http://www.myapp.com/myapp.css
    
  • Run a real test, with simulated users, sessions, form submissions, constrained arrival rates, etc. See Tsung.