wiki:DebianLamp

Notes on a LAMP5 setup on Debian 6.0 (Squeeze)

  1. MySQL
    1. Installation
    2. Configuration
  2. Apache
    1. FastCGI
    2. Global configuration
    3. Virtual host configuration
    4. Limits
  3. PHP
    1. Installation
    2. Configuration
    3. Module or FastCGI ?
  4. Testing

This is a documentation with notes and hints retrieved from the experience of Bearstech while hosting different applications on various infrastructures, from a simple shared hosting server to a 10-server cluster. We tried to keep the most common and interesting bits that should help in every situation.

By LAMP5 we mean the use of PHP 5.x and MySQL 5.x technologies. Most importantly, we only focus on Debian GNU/Linux 6.0 (Squeeze) packages. Namely:

  • MySQL 5.1.49
  • PHP 5.3.3
  • Apache 2.2.16

Todo:

  • Notify that php_* Apache directives can't be used with the FastCGI model
  • Add suExec sample

MySQL

Installation

# aptitude install mysql-client-5.1 mysql-server-5.1
# mysqladmin password <gory password here>
# cat >~/.my.cnf
[client]
user = root
pass = <gory password here>
<Ctrl+D>
# chmod 600 ~/.my.cnf

Depending on you 'APT Priority', you might be prompted for a MySQL root password : in this case, skip the mysqladmin password command. If you run the latter command, keep in mind that the password could be seen :

  • via the ps command for a very short time, but there is no interactive version of mysqladmin password
  • in the shell history (~/.bash_history unless you run unset HISTFILE before logging out, but don't bother)

Storing the MySQL root password as clear text for root-only on the server itself is not a security weakness, or at least far from the most worrying one. If your server is compromised at the root level, then your ~/.my.cnf is not even necessary for an attacker to own all your MySQL databases. And it is quite easy to retrieve all database credentials from the web application configuration files (ie. from non privileged accounts).

Based on Bearstech's experience, it is more secure to generate and keep the MySQL root password on the server; rather than storing it in non-safe tools (wiki over plain HTTP, emails, shared spreadsheets, etc). We simply tell the user : "Please login with SSH, run 'mysql' and you're in.". Most of the time the user is not aware of the actual value of the password, and the security level boils down to the one of SSH (which would use keys, of course). Not mentionning that most automation jobs (backups, etc) become trivial, and any authorized adminsys not losing any time looking up for scattered passwords.

Configuration

Most default values are very sane in Debian's /etc/mysql/my.cnf. As for various caches sizing, keep the default for a start unless you really know what you are doing. You will have more insight about the proper tuning when your application has run for a while (a famous tool is tuning-primer.sh).

A few useful hints:

  • max_connections: PHP is spooling MySQL connections per database and process. If you have 5 applications with their own database and at most 20 PHP processes, the MySQL server should be able to handle up to 100 concurrent connections (which is the default). If you hit this limit, application's mysql_connect() calls should raise a proper error, watch your logs.
  • table_cache: have it roughly equal to the total number of tables across all your databases.
  • query_cache_limit, query_cache_size: although this cache is the simplest and most efficient optimisation, refrain from setting big numbers right away. This cache is filled with lots of small requests in most cases, and 16MB might be far enough. Wait for some production time and check the cache hit statistics with show status like 'qcache%'.

You will want to activate the "slow query log" though, especially on production servers. It helps to catch queries that were not properly indexed or that may be the most expensive:

log_slow_queries   = /var/log/mysql/mysql-slow.log
long_query_time    = 2
log-queries-not-using-indexes

And if you don't need MySQL replication, turn off the binlogs (it's on by default). That will spare you some disk space and I/O operations. Simply comment out the binlog related directives.

#log_bin           = /var/log/mysql/mysql-bin.log
#expire_logs_days  = 5

Don't forget to restart your MySQL server to take those settings into account.

Apache

Here we install Apache in its "worker" (aka threaded) implementation, and a FastCGI backend module. We don't document the traditionnal way of using the "prefork" Apache model with the PHP module (see DebianLamp#ModuleorFastCGI).

# aptitude install apache2-mpm-worker libapache2-mod-fcgid
# a2enmod deflate
# a2enmod rewrite
# /etc/init.d/apache2 force-reload

About the enabled modules :

  • mod_rewrite : most of the time you'll need , enable it now.
  • mod_deflate : it globally enables compression and is always a *good thing* (requires few CPU cycles for great gains : faster download, less bandwidth). It will break things with IE6, and that's also a good thing. If your PHP configuration or your application tries to use PHP's own gzip output handler (zlib.output_compression = On or output_handler = ob_gzhandler), turn it off. Apache is much more consistent at this job, and it will handle any request, static or dynamic.

FastCGI

The mod_fcgid has been working wonderfully as far as Debian 4.0 and is very well maintained. The alternative mod_fastcgi used to be preferred, although it never worked reliably under heavy load at Bearstech.

Once installed, you should edit /etc/apache2/mods-available/fcgid.conf like :

<IfModule mod_fcgid.c>
  SocketPath /var/lib/apache2/fcgid/sock
  IPCConnectTimeout 20

  # PHP as FastCGI :
  FcgidInitialEnv PHP_FCGI_CHILDREN 0
  FcgidInitialEnv PHP_FCGI_MAX_REQUESTS 0
  FcgidMaxRequestsPerProcess 1000
  FcgidMaxProcesses 16
  FcgidMaxRequestLen 100000000
  FcgidWrapper /usr/bin/php5-cgi
  AddHandler fcgid-script .php
</IfModule>

Let's explain those settings (the first two lines are from Debian defaults and are fine) :

  • FcgidInitialEnv: we turn off PHP own's process spawning. Fcgid is better at this, and has more insight sitting on the Apache queue, it will make better decisions. With FcgidMaxProcesses it allows you to set a global load limit, which is harder to obtain if you have several PHP pools. One downside is that you cannot share some APC caches between plain PHP processes; from Bearstech's experience, this has not been a performance problem and is a big stability and simplicity win.
  • FcgidMaxRequestsPerProcess: it is a good idea to recycle PHP processes, a single PHP module might have memory leaks. The limit can be high as long as it exists; however starting from 1000 we consider that the overhead of starting one process to handle 1000 requests is negligible.
  • FcgidMaxProcesses: most PHP apps are CPU-bound, this setting will help you define the maximum load. A very conservative setting would be FcgidMaxProcesses = number of CPU cores; your server would never slow down (if it's only running PHP). Meanwhile most PHP apps also wait on MySQL to answer, thus N processes won't automatically generate a load of N. A good starting figure is 2 x number of CPU cores, then tune it according to observed load records (use Munin, Cacti, Zabbix, Collectd, etc)
  • FcgidMaxRequestLen: with this version of mod_fcgid, the default value of this parameter is way to low and will prevent large uploads (Debian Lenny was 1GB, while Squeeze is 128kB). Setting it to 100MB looks reasonable, YMMV. Remember that ploads will be limited by the smallest parameter among :

Keep your habit of running /etc/init.d/apache2 reload when you change a PHP setting in php.ini: fcgid will ask all PHP CGI processes to end their current processing request and terminate (aka graceful reload). Then newly spawned one will use the new settings. This is why a reload can take some time (30 seconds is not uncommon).

Global configuration

You may want to tune the worker MPM first. As usual, the MaxClients is the maximum number of simultaneous HTTP connections you may handle. The worker MPM achieve this result by using one thread per connection, a bunch of threads per process and at most 16 processes. The trick is that MaxClients must be a multiple of ThreadsPerChild, because Apache computes MaxClients / ThreadsPerChild to get the number of processes to start.

Apache has a safeguard limiting to 16 processes and 64 threads per processes but you can raise those limits with the proper directives. We can handle 1024 simultaneous connections with this sample configuration:

# /etc/apache2/apache2.conf
<IfModule mpm_worker_module>
    MaxClients           1024
    ThreadsPerChild        64
</IfModule>

Or if you want something more beefy :

# /etc/apache2/apache2.conf
ServerLimit 128
ThreadLimit 64
<IfModule mpm_worker_module>
    StartServers         8
    MaxClients        8192
    MinSpareThreads    512
    MaxSpareThreads     64
    ThreadsPerChild     64
</IfModule>

A few other interesting items:

  • The default connection Timeout should be lower than the default 300sec if you have a busy server. This timer only runs while connections are idle, thus connections can be much longer that this timeout. A safe and realistic value might sit between 10 and 30 seconds. Note that Slowloris-like attacks are mitigated by the reqtimeout module which is activated and sanely configured by Debian as a default.
  • Use KeepAlive, but with a short timeout: this is beneficial for many clients (less RTT due to less TCP setup/handshakes), and is not worse for the server connection pool availability than turning it off. The server-side TCP establishment overhead is not an important figure nowadays. Ideally, you know how many hits (images, CSS, JS) are needed at most to build your application heaviest page and set MaxKeepAliveRequests to it. Eg:
    KeepAlive On
    MaxKeepAliveRequests 50
    KeepAliveTimeout 2
    
  • Maybe have a less verbose Server: header in your HTTP replies. If you have an up-to-date Debian, you could be proud to display all of your nice version numbers with the latest security fix, but most people prefer hiding the details. ServerTokens Minor should suit most people.
  • If your application is deployed and maintained via a VCS, don't publish the bookkeeping files:
    <Files ~ "^\.(bzr|cvs|git|hg|svn)">
        Order allow,deny
        Deny from all
    </Files>
    

Virtual host configuration

Here is the traditionnal named-based virtual host which handles exactly one hostname ('canonical') and redirects all other names to the canonical one, like every good website should do :

# All of these names reach our application via a redirection.
# This is defined first since this is the default vhost for Apache, the
# one which will catch names that we don't listed in ServerName/ServerAlias.
# Which means that if you only have those two vhosts in your server, you can
# skip the "Serveralias [list...]" and simply make sure that all ancillary
# names for your app have their DNS set to this server.
#
<VirtualHost *>
  ServerName  myapp.com
  ServerAlias myapp.org www.myapp.org myapp.net www.myapp.net

  Redirect / http://www.myapp.com/
</VirtualHost>

# Our application canonical vhost
#
<VirtualHost *>
  ServerName www.myapp.com

  DocumentRoot /var/www/myapp/public
  <Directory   /var/www/myapp/public>
    Options +ExecCGI
    AllowOverride FileInfo AuthConfig
  </Directory>

  # Protect some private area via auth and/or IP whitelisting
  <Location /private>
    AuthName     "Private Area"
    AuthType     Basic

    # Plain user/password
    AuthUserFile /var/www/myapp/htpasswd
    require valid-user

    # Adress-based
    Order Deny,Allow
    Deny from all
    Allow from 96.12.0.0/16
    Allow from 124.42.56.78

    # The following means that "valid-user" OR "allowed <ip>" is okay
    Satisfy any
  </Location>

  ErrorLog  /var/log/apache2/myapp/error.log
  CustomLog /var/log/apache2/myapp/access.log combined
</VirtualHost>

Since fcgid needs the ExecCGI option set but you certainly don't want to handle any actual CGI, you will have to make sure there is no other handler active in the same context (RemoveHandler may help you). Debian does not activate the common AddHandler cgi-script .cgi as a default, with a pristine configuration you are safe.

The <VirtualHost> section uses a couple of very simple ideas:

  • Use one log folder per application. Sysadmins and many analyzer/stats tools will prefer it if you have distinct logs. The important trick is to fix the logrotate invocation in /etc/logrotate.d/apache2, eg.: /var/log/apache2/*.log var/log/apache2/*/*.log {.... You still end with a "global error file" /var/log/apache2/error.log (defined in /etc/apache2/apache2.conf) which holds all errors and notices which do not belong to a known virtual host context. This is the actual error.log you'll want to watch night and day as a sysdamin, the other (per-vhost) ones being useful for the application maintainers.
  • Use one servername in your application virtual host (that would be called your application "canonical name"). All other names should be redirections. The rationale is mainly about sessions and making sure they all end in a unique domain, about SEO and good hosting practices. Try to catch all alternate names for your application. If Apache does not find a matching ServerName or ServerAlias entry for a given request, it will fallback to the first defined virtual host. This is why the redirections are set up first in our examples.

Limits

All processes have a default limit of 1024 file handles (see /etc/security/limits.conf). Apache will need at least two file descriptors per connection (one for the client socket, one for the served file or the fcgid socket), plus a bunch for logfiles, etc. Although Apache (driver) is a root process and could opt to raise this barrier, it relies on you doing it. Append to your /etc/default/apache2 file :

# Raise default 1024 file descriptor limit per process for Apache
ulimit -n 64000

See you favourite kernel documentation about a per-connection memory cost, most of the time it's negligible, even with a firewall. As a very rough estimation to help you choose your hardware, Apache 2.2 under 64bit GNU/Linux with the default module set has been reported to "cost as a whole" 100MB for 1000 concurrent connections (see this bench).

PHP

Installation

You have the choice between the module (libapache2-mod-php5) and the CGI/FastCGI flavour (php5-cgi). Here we focus on the FastCGI setup, see DebianLamp#ModuleorFastCGI for an explanation.

It is a good idea to always install the PHP backend (here -cgi) and the -cli mode together, as most applications require some cronjobs or other background tasks to run indepently from the HTTP context. Keep the -cgi and -cli conf in sync, they should only differ on a few parameters. For instance memory_limit and max_execution_time : in CLI mode you will run heavy and isolated batch jobs which might require more memory than the average HTTP request handling.

Modules loading and configuration is shared by the CGI and CLI SAPIs by virtue of /etc/php5/conf.d/<module>.ini files, embrace this.

At a minimum we install Suhosin and the php5-mysql extension (php5-mysqli being an even better idea, see below).

# aptitude install php5-cgi php5-cli php5-suhosin php5-mysql

Configuration

Reference configuration: source:/lamp/php-cgi.ini

Have a good configuration file in /etc/php5/cgi/php.ini. Outstanding params:

  • display_errors: this must be always off. Even in a test/dev context, it should always be off. It's totally un-practical to have your output munged with random debug statements, and the error.log is there for some purpose. Either you have an application running ini_set('display_errors', 'on') and you can bash the developer, either you have a sane framework which handles frontend debug output by itself and properly.
  • log_errors: this must always be on. See above.
  • memory_limit: should be low (16..32M is nice), and only upgraded with ini_set('memory_limit', $bytes) in very specific scripts (or not at all). Suhosin will turn this limit as a hard limit from the programmer point of view, which is a good thing. The maximum memory usage by process must be guaranteed in order the sysadmin properly configure the server (see below), which does not happen if the programmer may raise the bar at any time with any value.
  • register_globals: this must be off. Application which rely on this setting to be on are very old or totally insecure, most of them both.
  • post_max_size, upload_max_filesize: are better kept in sync (effective upload limit is the minimum of these two values), any value will do. It's okay to have 100M here.
  • upload_tmp_dir: should be in the partition where data will end, sane default is often /var/tmp (instead of /tmp)
  • allow_url_fopen should be off (if turned on, a simple directory traversal attack turns into a remote exploit). This can be mitigated by being activated and setting allow_url_include = off, nonetheless CURL is a far better match if the application needs to be a FTP/HTTP client (safer, proper error handling, good documentation).
  • default_socket_timeout: keep it low, something around 10 seconds instead of the default 60 sec. Most of the time you don't want a PHP request wait on more than a few seconds on an API call or an RSS feed fetch (it's server-server communication, timeouts should be an order of magnitude lower than server-client ones).
  • session.gc_maxlifetime: the default session expiration, often left as is by many applications. You'll want to have something like 30 days (2592000 seconds) if you know your application is quite sane with session handling (eg. file-based session storage with +1000 new sessions/day is quite a limit). With Debian, you especially need to know what /etc/cron.d/php5 is for.

Notes on some PHP modules:

  • If your application prefers mysqli over mysql, go for it. Your best hope is that it will use parameter bindings (safer) and prepared statements (faster).
  • If your application only transforms image or modify them in a trival way (say adding some watermark), don't use gd, use ImageMagick as an external program (NOT the module). gd is bad at image transformation (poor algorithms, memory hungry/no streaming process), and it is not safe nor efficient to have the image data in PHP memory space. On the other hand you might end with additional (unbound) ressources used by ImageMagick own processes.

Module or FastCGI ?

PHP is traditionnally run as an Apache module. It has some advantages, such as benefiting from the process spawning/handling of Apache (which is efficient, predictible, easy to tune and very resilient). In other words, it has no maintenance overhead.

On the other hand, it may bear some criticisms:

  • Lack of isolation: PHP applications run in Apache's runtime context and they share the same credentials (namely all run with www-data user privilege). Although there are some solutions with suExec (alone) and suPHP, those are not mainstream and have problems of their own.
  • Shallow apache embedding: PHP as a module barely exposes an Apache API. Other languages and frameworks will provide much richer and sophisticated usage of Apache ressources. PHP and its developers don't use that: not a big deal, but it severely undermines the reasons to live as an Apache module.

The lack of isolation can be detailed with several consequences:

  • Security: we would obviously like to run every application with its own Unix account, maybe in its own chroot. And with the PHP module, several applications may (and will) end in the same process context.
  • Stability: for the same reason quoted above, applications wich relies on some global process behaviour might step on each other toes (a painful bug about mbstring.func_overload comes to mind)
  • Resilience: if PHP crashes, an Apache process crashes. Fortunately not a real issue since an Apache prefork process handles one request at a time and will be respawned as necessary.
  • Scalability: since PHP is not thread-safe (to be correct: many PHP modules are not), it forces the prefork model on Apache. This does not scale well since it limits the number of simultaneous HTTP connections to MaxClients, which is itself bounded by PHP memory constraints (and a default Apache internal limit of 256).

Now that the C10K problem has been identified and tackled in all modern kernels and HTTP servers, it's time to move forward. For Apache, it means switching to the MPM worker. And trhowing PHP out of Apache's nest.

FastCGI comes as a natural answer to this solution: it has been specifically designed to interface HTTP servers and web applications with a very simple protocol which preserve most if not all semantics of the famous CGI 1.1 specs. You get all benefits:

  • Apache does the HTTP job and the static serving, lives and scales independently of your web applications
  • Every application may live in its own context (own process, own Unix account, own chroot, own server, etc.) and have its own ressource set (number of processes, memory, etc.)

Testing

This is a very rough and unfinished chapter, but it might be useful as a quick reminder of basic things to check before going "live".

  • Check the virtual host setup:
URL Result
http://www.myapp.com/ Should "work", obviously
http://myapp.com/aaa Redirects to http://www.myapp.com/aaa
http://11.22.33.44/bbb Redirects to http://www.myapp.com/bbb (use your server IP address)
  • Check the output compression:
    $ wget -S --header='Accept-Encoding: gzip' -O - http://www.myapp.com/ >/dev/null
    ...
      Content-Encoding: gzip
    ...
    
  • Check the peak static throughput. You should get a number which is bound by your physical network with a remote test (more exactly the slower link between your server and your test machine), and bound by your CPU when run locally (I/O are not relevant since the static file is in memory cache after the first request).
    $ /usr/sbin/ab -c1 -n100 http://www.myapp.com/
    
  • Check the "parallel" static throughput. You should have the same total throughput than with the peak measures above (in other words, "it scales"), maybe slightly lower with a lot of concurrent connections. Try 2, 5, 10 and more concurrent connections and check that 1/ it scales up to the number of CPU, 2/ it does not slow down above the CPU limit. Eg:
    $ /usr/sbin/ab -c100 -n1000 http://www.myapp.com/
    
  • Run a real test, with simulated users, sessions, form submissions, constrained arrival rates, etc. See Tsung.
Last modified 9 years ago Last modified on Feb 3, 2012, 10:38:49 AM