Developing Great Software
Friday, July 17, 2009
HTTP Is Stateless Desktop application programmers as well as programmers developing in a traditional multi-user, client/server environment don't normally have to worry about maintaining the state of an application's objects. After all, programming languages like C, C++, Java and C# have built-in mechanisms that maintain a variable's state during the lifetime of the application, insuring that the value of a variable is maintained and that it is 'reachable' upon referencing that variable. The stateless nature of HTTP, however, creates artificial workflows for web applications that do not exist in applications developed for the desktop or for a traditional multi-user, client/server environment. Because HTTP doesn't persist an application's state across multiple HTTP requests means that application servers and frameworks have to provide some mechanisms to allow the programmer to do this. A common method uses hash tables that are stored to physical memory or to a database or file system at the end of processing an HTTP requests. The ability to persist these values so that they can be read back when the next HTTP request hits the server for the same client session provides the glue or the narrative thread that web applications need in order to bind the multiple HTTP requests into something that simulates an unbroken (broken by HTTP request boundaries) and contiguous workflow. Managing this session state is the responsibility of the application programmer. Every application and workflow imposes its own requirements as to what and when something is required to be persisted and herein lies the problem: getting this right 100% of the time is often the most difficult challenge a developer faces when developing a web application that is maintainable, responsive and scalable. To make matters worse, the difficulty grows proportionately as the application grows in complexity and as more maintenance is done. With tight schedules and dwindling budgets there is often no time to thoroughly conduct a regression test to ensure that new bugs aren't being introduced. My own experiences working on web applications have led me to believe that this isn't the best approach to dealing with the stateless nature of HTTP and that there has to be a better way. I have spent far too many nights and weekends debugging and fixing problems related to changes made to application logic and the subsequent mismanagement of session state needed to support the work flows across multiple HTTP requests to think that this is an optimal solution. Using The Session Hash Table Perhaps now is a good time to provide an example of how hash table based session state frameworks attempt to overcome the stateless nature of HTTP and the types of problems that are associated with this technique. I will juxtapose Wicket's approach to handling this after. Lets consider what appears on the surface to be a rather simple scenario, that of maintaining and displaying a counter whose value represents the number of times the web page is processed on the server and rendered in the browser. When the user requests the web page for the first time it should be rendered displaying a value of 1 for the counter. When the user clicks the button on the web page the browser should issue a get request back to the server. The application running on the server should process the get request by incrementing the counter and rendering the page back to the browser displaying the current value of the counter. The web page itself is simple and looks like the following: The first time the user requests the page it will be rendered in the browser with a value of 1 for the counter. Each subsequent click of the button whose label is 'Click Here' will result in the browser issueing a get back to the server and the page being rendered to the browser with an incremented counter. The image below shows what the page would look like after being rendered for the second time: Notice how the counter's value in the above image is 2. This is in response to it having been incremented on the server. So what does it take code-wise on the server to provide this functionality? Using a web framework such as ASP.DotNet and ASPX Web Forms, the object associated with an aspx page would be created in response to the HTTP request and the object's load method would be called in which it would retrieve the value from the session state hash table using a string value to identify the value. It would then convert the value to an integer, increment the value by 1, convert the value back to a string, assign the value to the label on the web page, save the value to the session hash and lastly render the page back to the browser. But wait! What would happen if this is the first time the page is processing? If you didn't compensate for this and have an alternate branch of logic the named value wouldn't exist in the session hash table and the program would abort because it was attempting to manipulate a null string reference. But wait! Couldn't you instead, you ask, store the value in a field of the aspx page object and just increment it every time it processes the request? The answer is no because ASP.DotNet like many other frameworks does not persist the state of the objects that represent web pages and HTML controls on the server between HTTP requests. Instead, ASP.DotNet as do many other frameworks instantiates the object associated with the web page every time a new request arrives. Instantiation of the object causes the object's fields to be assigned their initial values according to the object constructor or to their appropriate default initial values based on their variable types. While this is a trivial example, it isn't hard to see how this way of dealing with persisting state between HTTP requests can quickly snowball out of control. Now imagine that there is a web framework out there that actually does persist the objects associated with web pages and HTML controls across HTTP requests. Suddenly, schemes such as having to save application state in hash tables and persist them across HTTP requests are no longer necessary. Now your favorite object oriented programming language behaves almost the same way in web applications as it does in desktop or client-server applications. Imagine that the fact that the web application is running in a stateless HTTP environment is hidden from the developer and the framework takes care of persisting and restoring the objects associated with your application's web pages and HTML controls for you. Enter Wicket, Simply Eloquent and Eloquently Simple Well, you are not left to your imagination because there actually is a web application framework that does this and a whole lot more for you. It is the Java based Apache Wicket web framework. Using the example from above, here is the actual code written to the Wicket web framework that renders the web page pictured above: That's it. Nowhere is there code to save or retrieve string values from a session based hash table. Nor are there alternate paths of logic needed to differentiate processing that should happen the first time a web page is being requested from all other requests. There is just "normal" Java code - i is initialized in the HomePage constructor and i is incremented in response to the button's onClick method. The current value of i is rendered back to the browser in response to each request. This style will seem very familiar to those who have coded Java SE Swing based applications or to anyone who has coded desktop applications using other programming languages such as C# or even Basic. And if you are thinking that there must be some magic happening in the HTML page well there isn't and here it is: Nothing but pure HTML. Wicket is a Model/View/Controller based framework. HTML pages never mix markup with code. They are strictly the 'view' part of the MVC equation. The actual magic is happening behind the scenes and within the Wicket framework's request cycle processing which is responsible for persisting the HomePage object at the end of each request cycle and restoring it at the start of each request cycle. This is completely transparent to the application developer who can just assume that objects behave like... objects. Wicket hides the stateless nature of HTTP and enables coding web applications like one would code non web applications. Of course the developer must exercise sound judgment when deciding what and how much to persist in the objects associated with Wicket web pages and HTML controls. Persisting a list of 100,000 database records, for instance, for thousands of concurrent users is wasteful and could cause severe degradation or even cause the server to fail altogether. For instance, persisting a database table row's unique identity in a web page object is preferable to persisting the actual row of data since the database itself already serves as the underlying cache for the data. Having the identity of the row of data is enough to allow the data to be retrieved through a query to the database when and if it is needed. Wicket also provides a model architecture that promotes this type of resource usage. Wicket's LoadableDetachableModel class provides the developer with the ability to persist only the minimum amount of information across HTTP requests - such as a data row's identity value - which can be used to query the database for the actual data when and if it is needed and released at the end of the request. Summary The Apache Wicket web framework eliminates one of the greatest challenges faced by developers of web applications, the dreaded persistence problem caused by the stateless nature of HTTP. Wicket also provides numerous other advantages to developers of web application, some of which I have already written about and which you can read here in my blog. I hope you have enjoyed reading this article as much as I have enjoyed presenting it to you. And as always, please feel free to leave your comments.