Monday, September 27, 2010

Web Server Basics: the LAMP Stack


I am planning to take a look at Go Daddy hosting service and I'll be writing about it here on the blog. They are offering features that they didn't have when I took a look at them a few years ago. Since this is a shared hosting, we won't be setting up a server from scratch but we will be going through the process of setting up a shared hosting server account and putting up a web application written in PHP. Part of my reason for doing this is to evaluate Go Daddy's hosting service, and to also show you how to put up a simple website from start to finish. Before I get into describing Go Daddy hosting, here's some general background on web server basics including how I host my own sites.

I use SliceHost, a VPS - Virtual Private Server provider, for hosting my web sites other than this blog which is on Blogger. I've been with SliceHost for several years and my "slice" (my virtual server) has maybe rebooted once during that time - so they are very reliable and network I/O always feels speedy, although I haven't actually measured it. I've seriously considered moving over to Linode, as they provide more memory/disk space for the same money, and also seem to have a good reputation, but haven't had time to make that move yet.

Some of the differences between shared hosting and having a virtual server have to do with the ability to schedule cron jobs, have secure shell access to Linux shell, more control over security such as firewall settings and the ability to install any software you want on the server.

On my virtual server, I run Linux (Ubuntu Server), Apache, MySQL, and PHP. This is also known as the "LAMP stack". I use the Linux iptables firewall to turn off all ports except ports 80 (HTTP), 443 (HTTPS), and a port for SSH (Secure Shell). The term "stack" is used for protocols because of the way one protocol uses the services of another, lower layer protocol. You end up with a "layer cake" of protocols like this:

------------
| TCP UDP |
| IP |
| Ethernet |
------------

and thus it's called a "stack". This term can also be applied to application software, so you have an "application stack" such as the LAMP stack:

-------------
| PHP/MySQL |
| Apache |
| Linux |
-------------

Linux is, of course, an Operating System analogous to Microsoft Windows, Mac OS X, or Sun Solaris. There are many flavors of Linux, all with slightly different feature sets, with Ubuntu being a popular choice. The job of the OS is to provide a layer of software on top of the raw hardware, providing the ability to run multiple sofware processes, managing system resources such as memory, and supplying a set of APIs (Application Programming Interfaces) to do things such as read and writes files on the disk or perform network I/O.

Linux by itself does not give you the functionality of a web server. It has all the low level API's such as IP networking and file I/O, but the intelligence to handle HTTP protocol (the protocol of the web) requires a higher layer of software. That's where Apache comes in.

Apache is web server software. It runs as a process (actually multiple processes) under Linux (although you can also run builds of Apache on other OSes including Windows). The most basic job that it does is listen for incoming TCP/IP (*) connection requests arriving from browsers on port 80 (or 443 for https). When it sees such a request it accepts the TCP/IP connection, and reads the contents, which most often is a HTTP GET request. The HTTP GET message will (usually) identify a file name on the disk that the browser is trying to retrieve. Apache then opens the requested file, reads the contents and sends it down on the TCP/IP connection to the browser.

It's as simple as that for static HTML files, CSS files, images etc.. But if the request is for a PHP file things get more interesting. Now Apache sees that it's a request for a PHP file so instead of reading the file and sending to the browser, it passes the request to the PHP interpreter, which usually runs as an Apache Module and is of course on the server. The PHP interpreter now reads the file, parses the PHP sections of the file and executes them. (Note that a PHP file can have parts that are just HTML, and other parts that are PHP code. The interpreter just passes the HTML parts down to the browser, but the PHP parts are executed.).

If the PHP code needs to access a database, then we start to involve MySQL. MySQL is running in it's own Linux process, so inter-process communication (IPC) is used between the Apache PHP module, and the MySQL server process. PHP sends a SQL statement to MySQL over IPC. MySQL executes the statement and passes the result back to PHP, again over IPC. PHP then processes those results, perhaps looping through an array of records, and dynamically building an HTML result to send down to the browser over the TCP connection that Apache established with the Browser.

So that's the basic operation of the LAMP stack. Of course there are other server side server technologies such as Java/Tomcat, Microsoft IIS/.Net, Ruby on Rails, Python/Django, etc.., but at some level they all more or less follow the same basic processing pattern, accepting connections from the Browser, passing the connection off to some code written in a higher level language, accessing a database, and sending down HTML/CSS/JavaScript code to the Browser.

(*) TCP/IP is an overloaded term with multiple meanings depending on the context. In the most broad definition it refers to a whole suite of protocols and management software that we commonly think of as the "Internet Protocol" or IP. So in this sense the terms TCP/IP and IP really mean the same thing. Another definition of TCP/IP is a specific, connection oriented protocol that runs over IP.


Browsers and web servers talk to each other using HTTP (Hypertext Transport Protocol) delivered over TCP/IP connections or sometimes just called TCP connections. TCP is a protocol that provides point to point, reliable byte stream between the browser clients and web server. TCP runs on top of IP, which delivers "packets" of data between the two endpoints. The TCP protocol breaks up the stream of bytes being sent into "chunks" that are then sent in individual IP packets.

If an IP packet is lost in the network somewhere, TCP is smart enough to retransmit it. If IP packets arrive at the destination out of order, TCP puts them back in order to reconstruct the original byte stream that was sent.

TCP/IP stacks are typically written in C (or maybe C++ in some cases). The protocols follow a large set of standards defined by the Internet Engineering Task Force(IETF). The IETF documents the protocols in RFCs (Requests for Comments) although that name isn't really accurate for RFCs that have become established standards. When an RFC is written and is in draft form, then it really is a Request for Comments but once its standardized , it doesn't change. However a new RFC can be written to make it obsolete.

All the RFCs can be viewed online at IETF.org. So these standards define what the TCP/IP stacks do, and ensure that a Windows IP stack can talk to a Linux stack or an Android IP stack.

2 comments:

PHP programming said...

For the future planning this blog is very useful and supportive to php programmer. Growth of business is very fast to use this blog ideas in the future. Give more ideas for the future in the next article in this blog.

Montana Flynn said...

For your own good, stay away from GoDaddy! They use external MySQL DB's that will bring your site to a crawl.

Post a Comment

Note: Only a member of this blog may post a comment.