Saturday, October 29, 2005
The myth of the stored procedure
I just came across an interesting thread on the Ruby on Rails blog here. I'm not going to touch the arrogance issue because that's too subjective. What I will address are two things:
1 - the need for stored procedure support
2 - the need for xml configuration
As the reader will have no doubt surmised, I don't believe in stored procedures, however that wasn't always the case. Having been primarily a developer and secondarily a dba for nearly 12 years now, I've had the opportunity to work on some massive systems. I've had the opportunity to lead the design and implementation of some rather large systems, intially I used stored proc's heavily, but over time the glaring problems with stored procs have become rather obvious.
As I see it there are three primary problems with stored procedures
They don't scale. I can practically hear the outrage at that statement. What are you talking about? Stored Proc's don't scale? You've have to be crazy! Before you go huffing off let me explain that statement. When I talk about scalability I am not referring to machine performance per se. Instead I'm talking about the position that on any system of sufficient size, the most important problem and least scalable resource will be the allocation of programmer time and their ability to comprehend the complexity of the system. I'm sure there are exceptions to this rule, but I am talking about the vast majority of systems, rather than a few special cases. Ok, so you believe it or you don't, how exactly do stored procedures fit into this picture? It's a generally recognized principle within the software development world that code duplication is bad. Cut and paste coding causes problems because logic is duplicated all over the place. When you make one change you then have to go and find all of the other places where that logic resides and duplicate your changes in the code. All software developers know this, there are a million ways to avoid this issue when it comes to software. But when it comes to databases, somehow that rule no longer applies? Stored Procedures are effectively cut and paste coding of SQL. It's not a problem if you only have one or two databases to deal with, just like if you only have one or two code files cut and paste isn't a huge issue. Multiply that by a few hundred or a few thousand though, and you have a serious problem. If you have ever had to propagate a changed stored procedure to a couple of hundred databases you know what I'm talking about. Now I'm not saying that stored procedures are all evil in all circumstances, but as with all things in computer science, their advantages and disadvantages must be understood and weighed. Generally I would say that building a system on top of stored procedures is a bad idea.
Stored procedures are not portable. This is kind of a no brainer, typically the database vendors don't even try to argue their stored procedure syntax is portable to other systems. Usually their argument goes something like "and to port this application, all you have to do is re-write the stored procedures". Generally I think that's probably not even true, but let's assume for a minute it is. Apply that logic to the same scenario above. So you have to re-write your stored proc layer then propagate those changes out to hudreds or thousands of databases? Hard to imagine how people get locked into a database isn't it? I can tell you from first hand experience that it's unlikely any significant system will be ported under those circumstances without a very serious reason. So the basic idea is to build your system in a way that you can easily use another database if necessary (this includes new versions of the database by the way).
Finally stored procedures don't facilitate reuse. This is related to the first issue, but isn't identical. In this case I'm not talking about duplication over a large number of databases, rather I'm talking about leveraging your codebase in different situations. This can be within the same database or within a different system. The point is anything you put in a stored procedure stays in a that stored procedure. There really are no good methods for high level abstraction within stored procedures. Now maybe there is some method that applies to some specific database, if so, it's not portable to other databases. On the other hand through careful use of abstraction in code you can reuse your logic in other situations.
I know there are other issues with stored procedures, but for me those are the big ones. Now to XML tags and configuration.
As with stored procedures, XML is one of those things that should be used at the right time under the right circumstances. XML was designed as an abstraction for marking up data, to me it was never designed as a programming language. I have had the misfortune of using the XML heavy frameworks (j2ee, asp.net, Zope). XML definitely makes those systems more configurable, but one thing it definitely doesn't do is make development faster. Wading through the morass of XML configuration has never sped me up, but has always dramatically increased the time to implement.
I'm sure there are situations where XML is useful, but as an Entrepreneur I am less interested in those situations and I'm more interested in the situations that I can leverage the power of DRY and sensible defaults to maximize my time. That philosophy more than anything else has really sucked me into Rails. The sheer joy of focusing on the problem and not the damn configuration files cannot be underestimated. If you really think XML is the solution, please stay in the XML frameworks. That is one direction I sincerely hope Rails never moves.
Tuesday, October 18, 2005
The joy of windows :/
So I took today off from work as an extra day to spend some time with my wife and recouperate from my trip to the startup schoool. Of course, I'm typing this message from work. So much for that extra day off.
We do E-discovery here at work, and we use a collection of tools that imo are worst of breed. We use windows as a file server to house 100's of millions of image and text files. If you think windows doesn't scale well to deal with those numbers, well, you're absolutely right. It doesn't, we have all kinds of problems dealing with the size and number of files. Then we use SQL Server as our database backend. Certainly that's a step forward from Access (which they used to use), but it has all kinds of scale problems for what we are doing, not to mention the horribly designed structures we are torturing it with. Now add to that a visual basic program that does some automation to load other programs and extract data, and you have barely a hint of the problems I deal with at work.
The upside is that I spent most of my time working with Python, and I have an really bright team of people that works with me. Our team uses Python to process text and images, and we are not responsible for the platform decisions we have, we just have to work within them.
So one of our systems runs a C# application designed by another team, and the system is just falling apart. SQL Server 32 bit apparently cannot use more than 1.7 GB of ram without a hotfix. The hotfix says not to do it on a server with only 4 GB of ram. Great. So we bought a Quad Xeon processor with 32 GB of ram from Dell. After some problems we finally get the 64 bit version of windows installed. Whoo..time to get the 64 bit version of SQL Server installed. HA! That only runs on Itanium servers, these are some other whacked 64 bit architecture. wtf?!
Ok, time to install the 32 bit version of SQL Server, so far a complete waste of about 6 hours. After we get it installed, we installed service pack 4 for sql server and the Awe hotfix to allow sql server to address more than 2 gigs of ram. Reboot the server (because that's always a good choice with windows). Ok, to test we set the SQL Server to use a fixed amount of ram rather than dynamically allocate the ram. To test we set it at 10 GB of ram. Now, one might think that the sql server process would then start ramping up it's memory usage and/or spawn a few processes to take up 10 GB right? wrong! It sat there at 128 MB of ram. We tried several different options, none of which worked. At this point my boss suggested we look at Perfmon because he heard you can only see how much memory it's using there. Guess what, none of the SQL Server performance counters are there. Recheck the install, hmm, everything ok there. Reboot the server (because that's a good idea under windows), crap they still don't show up. Apparently the 32 bit counters don't work under a 64 bit OS. Unfortunately the perfmon counters are critical to our performance evaluation, without them we have no idea what sql server is doing.
So now we are down to reinstalling a 32 bit version of windows on the 64 bit quad processor server we have. Of course the night is still young.
Hard to imagine why I refuse to use windows and sql server outside of work eh? If it were my company, Open Source products are the only way to go. Or at the very least, if you want to play in the enterprise, you need to get real enterprise level products. As for me and my house, we'll use Rails on Linux.
Monday, October 17, 2005
I just got back from Paul Grahams excellent Startup School. What an incredible experience! If you are a would-be entrepreneur, I would highly encourage you to attend next year. The depth and quality of the speakers was as impressive as any conference I've ever been to. I won't bother to rehash the content, if you are interested, check out this tag on del.icio.us: Startup School. The summaries there are fantastic, and represent an incredible amount of work. I can't wait for the video and audio to be posted.
On sunday following the conference Paul held an open house at Y-Combinator for people who wanted to talk to him in more depth than the 5 minutes allowed between speakers. Additionally several of the Summer Founders were there, including both of the guys from Kiko.com and Aaron Schwartz from Infogami.
I had a great opportunity to talk with a lot of really bright people about their ideas. I also had the chance to see several people pitch their ideas to Paul Graham and to other people at the event. It was very instructive to see how their presentations went, and what ideas people liked and what ideas flopped. Overall people were very nice and very helpful to each other, even people who would potentially be competitors in the marketplace. Here are a some things I noticed about presenting your ideas:
A Working Demo: This was stressed over and over during the startup school, a working demo is an incredibly powerful and effective way to get your idea across. Why is a demo so important? For one thing a demo separates the doers from the dreamers. A lot of people have good ideas, not many people will go to even the trouble of putting together a demo. I know dozens of people who want to start a business, however the number of people who have done anything in that direction can be counted on one hand. In the end, it all boils down to action. A demo also provides a visceral and concrete representation of your idea. Explaining your idea really is a poor substitute for showing your idea.
People don't need the background: Your presentation shouldn't turn into a teaching lesson on the specifics of your industry or field. I saw someone talking to Paul for almost an hour, he spent a great deal of time trying to explain the background and the specifics of his industry. I think this means he hasn't done the next point, his idea isn't really crystalized yet.
Boil your idea down to it's core: It's important to really know what you are trying do. What problem you are trying to solve, and what differentiates you from your competitors. You should be able to summarize this to someone in less than sixty seconds. That doesn't mean there isn't more elaboration in your idea, but it gives someone a high level context to work within, and something very specific to focus on. The tighter and more focused you can make your idea, the more likely you can actually pull it off. It's much easier to build something that is focused on very specific ideas than it is to build something that is still vague and nebulous in your own mind.
Know the competitive landscape: Ok the other person understands your idea, they know generally what you are trying to do. At this point the natural progression seems to be "Have you seen x?" Where X is something they see as a competitor. Clearly it's not possible to know of every possible competitor out there, but you need to really do some research and understand what your competitors do and don't do. What you like and what you don't like about their systems. Ultimately you need to know why your system is better than theirs, and why people should use your system instead.
Know what you want to do: This might sound simple, but I'm not talking about having a general idea of how the site will work. I mean, you really need to have a pretty solid idea of how you are going to solve the problems you are planning to solve. How do you do this? Build a demo or a prototype.
I would like to extend a heartfelt thanks to Paul Graham and to Jessica for putting together the Startup School. It was a wonderful opportunity and it was really well done. I would attend again in a heartbeat.
Wednesday, October 12, 2005
File Server update
Remember that whacked out file server I mentioned in april? It turns out that my random file checker script magically fixes some problem with the server. When the script isn't running we start getting these wierd "file not found" messages. For some reason, everytime we fire up the script the message miraculously stop coming. If we stop the script the errors will start happening anywhere from several hours to several days later, but they always recur. If the script is running though, they never come back. Very bizarre.
Rooming with strangers
I'm headed to the Startup School this weekend. In order to save on cost I've decided to split the cost of the room with three complete strangers. Financially it works out that it's a good deal. It turns out that all three of them are from my state, and they live in my area. I was a bit suprised that anyone else from my state would be attending. Funny how things turn out.
I'm getting ready to release a version 1 of a project I've been hacking on for a few weeks now. It's a rule based random name generator (Rubarang). More details to come. My last site skillfulstudent has stalled to some degree, the partner I'm working with moved to another city and we rarely see each other any more. We talk on the phone relatively frequently, but it hasn't been enough to keep the project rolling well. Hopefully we'll get some of these kinks worked out and get it rolling again. Either way, I have several other projects in mind, and after I get this first version of the Rubarang out, it's on to other things.