Software Development Tidbits: November 2008

Saturday, November 29, 2008

Website SSL Certificates

Understanding Web Site Certificates [1] has a nice succinct description of website certificates. In summary, a website certificate is used to identify a secure web site, in the sense that it is a trusted web site (e.g. not a phishing site), and data being transmitted and received to and from your browser is secure (e.g. encrypted using SSL). Trusting a certificate means you are trusting one authority from a list of certificate authorities known by your browser to have verified the web site you are visiting is legitimate and secure. Although rare, this process has been known to fail. Brian Krebs in The New Face of Phishing [2] described a sophisticated phishing scam that used a valid SSL certificate issued by a "trusted" authority. If you visit a website that has a certificate signed by an organization untrusted by your browser or the certificate contains an error (e.g. certificate has expired), the browser displays a dialog prompting you to decide if you want to accept the certificate [1]. Before accepting a certificate, ensure it

has a valid and trusted issuer, such as Verisign,
has not expired, and
has been assigned to the web site organization you are visiting.

If this dialog is not displayed, say because your browser accepts the certificate, you can still manually examine the certificate if you wish. Normally this can be done by clicking on some visual indicator on your browser while you are on the protected site. Don't just assume that because a website is protected by a certificate that site must be legitimate. Some phishing sites have used self-signed certificates to create the illusion of legitimacy [3]. It is the site authors hope the unwary visitor would be tricked into believing that because they have been given a certificate, the site is secure so they can safely submit their personal information. A web site who issues a certificate to itself should always be viewed with some suspicion [4]. Do you need SSL certificates for intranet (internal only) websites? If you are transmitting sensitive information between browsers and servers that some employees should not see (e.g. passwords), then yes. This assumes you believe your employees are malicious enough to start snooping for such confidential information. What about phishing? This may be less of an issue because the phisher would need to know the look and feel of your internal website in order to mimic it convincingly. But if such information can be obtained, then SSL certificates would be useful. References

[1] Mindi McDowell and Matt Lytle, National Cyber Alert System, Cyber Security Tip ST05-010, Understanding Web Site Certificates, Carnegie Mellon University, 2008 [2] Brian Krebs, The New Face of Phishing, The Washing Post, 13 Feb 2006 [3] Bill Brenner, Phishers' latest hook: SSL certificates, The New Sendmail, 27 Sep 2005 [4] Jack Schofield, Website certificates -- don't go there?, 2007

Thursday, November 27, 2008

Robert Glass' Fundamental Facts: A Reminder

IEEE Computer Society has made public an article entitled Frequently Forgotten Fundamental Facts about Software Engineering [1] by Robert Glass, author of the excellent software engineering book Facts and Fallacies of Software Engineering. Although I believe all software developers could benefit from reading this article, it seems especially relevant to team leaders and managers. I'm sure most seasoned professionals are aware of these facts already, but as the title suggests, it's good to be reminded every now and again. A few juicy tidbits inspired me to rant a little. For those who want to build a great development team, remember that "good programmers are up to 30 times better than mediocre programmers"; moreover, good programmers are far more important to building great software than tools and techniques [1]. I have witnessed those in upper manager who believe that all is needed is a bunch of code monkeys, who can work the longest hours possible with the lowest pay possible coupled with the latest fad in development tools. If you throw enough programmers at the problem you will probably get the job done, but I'm willing to bet the product will be over budget, will be hard or impossible to maintain, will be buggy, and will certainly not satisfy the customer. In other words, it will not meet the common definition of quality software outlined by Glass: portable, reliable, efficient, human engineering, understandable, and modifiable [1]. In the end, the product will cost your company more than if you started with a great team up front. Higher the best and brightest, not the cheapest. Regarding estimation, Glass seems a bit pessimistic, but is so funny because of his brutal truth. In a nutshell, estimates are "done at the wrong time ... (at the beginning of the life cycle ... before the requirements) ... " and " by the wrong people ... (upper management and marketing)", thus "software projects do not meet cost or schedule targets. But everyone is concerned anyway" [1]. Priceless! I generally suggest to clients and/or management that estimates should be made after the requirements are done when the problem is better understood. But as pointed out by Glass, and corroborated by my experience, this rarely occurs. Generally a client wants product X by date Y within budget Z. Madness. References [1] Robert L. Glass, Frequently Forgotten Fundamental Facts about Software Engineering , IEEE Computer Society, 2008 [2] Robert L. Glass, Frequently Forgotten Fundamental Facts about Software Engineering, IEEE Software, vol. 18, no. 3, 2001, pp. 112,110–111 (The original publication)

Tuesday, November 25, 2008

Top 10 Web Application Security Vulnerabilities

If you are developing web applications, and don't know the meaning of and how to prevent the following 10 security threats, OWASP Top 10 is good reading material.

Cross Site Scripting (XSS)
Injection Flaws
Malicious File Execution
Insecure Direct Object Reference
Cross Site Request Forgery
Information Leakage and Improper Error Handling
Broken Authentication and Session Management
Insecure Cryptographic Storage
Insecure Communications
Failure to Restrict URL Access

Sunday, November 23, 2008

Steve Yegge's Property List Pattern Summary

Steve Yegge has written an article entitled The Universal Design Pattern in which he describes in detail the Property List Pattern [1]. The key design elements of this pattern (as he describes them) are:

It has the basic methods of a Map (using Java's terminology): get, put, has, and remove.
Keys are generally strings.
It has a pointer to a parent property list so properties can be inherited and overridden. In particular, certain operations, such as get, are applied on a child, but if the property is not found there, it is applied to the parent.
Reading a property returns the first value encountered for that property, from the child if it exists, otherwise from its ancestor.
Writing (and deleting) a property on the child only changes the property list for that child. When deleting a property that is inherited, it must be flagged in the child as deleted, not actually deleted. Otherwise, all siblings and the parent will have the property deleted.
Properties can have meta-properties. Common ones include information governing types and access control (such as "read-only").

Why would you want to use this pattern? Yegge mentions several reasons, including (1) it is very scalable, so it can be applied to single classes or used as part of a larger framework, and (2) it enables extensible systems. Yegge also describes several issues with this pattern, two of which struck home with me. The first is its performance may be unacceptable for some applications, although Yegge describes many optimizations that can be made. The second is that it is subject to data corruption. For example, incorrectly spelling a key and then adding data. References [1] Steve Yegge, The Universal Design Pattern, 2008

Friday, November 21, 2008

Recompiling JSPs During Development in WebSphere/Eclipse

Here's the scenario. You change a public static final variable, say version number, which you are using in your JSPs. You clean the project and rebuild it expecting the new version number to show up in your rendered JSP page, but it still shows the old version number, completely ignoring the change you just made. Or, when you are using JSP includes, the include changes, but the file including it does not. The result: the latter is not recompiled and will not see the former changes. What!? OK, so maybe there is some special command in WebSphere (WS) I need to use --- none that I can find. So what do you do? A Google search revealed one method (suggested on Java Ranch forum [1]): Delete the compiled JSPs from the server's cache. On WS 5.0, this is located in the directory Workspace\.metadata\.plugins\com.ibm.wtp.server.core\tmp0\cache\localhost\server1\EAR\war. However, I'm using WS 7.0 and the cache is no longer located in this directory, at least I couldn't find it. The other method is to manually save ('touch') each JSP file so its modification time changes; if the files are modified, they will be recompiled. Of course, you can open each JSP file, make a change, and save it. I added a "touchjsp" target to my Ant build script to automate this for me. The target is something like the following

<target name="touchjsp">
 <touch>
   <fileset dir="WebContent" includes="**/*.jsp" />
 </touch>
</target>

This can be called as part of the build process, or directly to simply touch the JSP files. Depending on how the target is executed, you may or may not have to refresh the project in WS to detect and compile the files. Having to do this manually in an IDE as mature as Eclipse/WS is ridiculous. At minimum a command should exist to "recompile all JSP files". If there is an easier way to do this than that described above, please enlighten me. References [1] JSP changes does not reflect, Java Range Forum, Jan 2008

Wednesday, November 19, 2008

Flying Saucer, XHTML Rendering, and Local XML Entities

I have been using Flying Saucer's (FS) XHTML Renderer [1] for at least 5 months now. It's an excellent library for rendering XHTML for display in Java Swing (using FS's XHTMLPanel) and for converting XHTML content to PDF files (see Generating PDFs for Fun and Profit with Flying Saucer and iText [2]). In my latest project, I create a report in XHTML and use FS to create a PDF version as follows:


String baseUrl ...

// root URL for resources


InputStream xhtml = ... // XHTML content

DocumentBuilder builder = 

  DocumentBuilderFactory.newInstance().newDocumentBuilder();

Document doc = builder.parse(xhtml);

ITextRenderer renderer = new ITextRenderer();

renderer.setDocument(doc, baseUrl);

OutputStream os = new FileOutputStream("Out.pdf");

renderer.layout();

renderer.createPDF(os);

One benefit of using this method to create a PDF is that it is much easier to layout the report in XHTML than using a PDF library like iText directly. A second benefit is that I can render the report in two different formats (PDF, XHTML) and render it in a GUI with very little extra work.

Everything worked fine until yesterday.

The problem I ran into today was FS (more correctly the libraries it depends on) would randomly fail with errors such as

java.net.SocketException: Connection reset

and

java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

After help from the very responsive FS users group [3], the problem was that my XML document builder would sometimes fail when accessing the web to resolve XML entities, even though I was connected to the Internet. I was able to fix the problem by adding one line of code to configure the entity resolver to use XML entities on the class path (and conveniently packaged in FS's core-renderer.jar).


String baseUrl ... 

InputStream xhtml = ... 

DocumentBuilder builder =

  DocumentBuilderFactory.newInstance().newDocumentBuilder();

// Use FS's local cached XML entities so we don't

// have to hit the web.

builder.setEntityResolver(FSEntityResolver.instance());

Document doc = builder.parse(xhtml);

ITextRenderer renderer = new ITextRenderer();

renderer.setDocument(doc, baseUrl);

OutputStream os = new FileOutputStream("Out.pdf");

renderer.layout();

renderer.createPDF(os);

The moral of the story is that some thought should always be given to how you configure your XML parsers to resolve entities. In many cases it makes more sense to use a local store, not a default remote site such as www.w3.org. This is especially important if the application doing the parsing may run on computers not connected to the Internet. This applies to FS and other libraries that depend on parsing XML.

References

[1] The Flying Saucer Project, 2008
[2] Joshua Marinacci, Generating PDFs for Fun and Profit with Flying Saucer and iText, 2007
[3] Flying Saucer User Mailing List, 2008

Sunday, November 16, 2008

Math Interval Notation

Recently, while writing some documentation, I needed to use some notation to represent a range of numbers. Given my math background, I naturally fell to using mathematics' notation. And, as happened many times in the past, I could not remember the type of brackets used to represent inclusion and exclusion: was it parentheses or square brackets? It turns out that parentheses ( and ) are used for exclusive end points and square brackets [ and ] are used for inclusive end points. Let a and b be enumerable values (more specifically, they are members of a totally ordered set) such that a < b. Then for endpoints a and b,

(a,b) is all values > a and <> [a,b] is all values >= a and <= b (a,b] is all values > a and <= b [a,b) is all values >= a and < b

References [1] Interval (mathematics), Wikipedia, 2008 [2] Set-builder & Interval Notation, Oswego City School District Regents Exam Prep Center, 2008 [3] Totally Ordered Set, Wolfram MathWorld, 2008

Friday, November 14, 2008

Sansa c240 Portable Detection in Winamp

I had the Sansa c240 MP3 player since last Christmas and always manually dragged and dropped music files onto the device using Windows File Explorer. This can be a painful process when you want to transfer a variety of songs and albums distributed throughout several directories. So I decided to try and manage the transfer of files using Winamp. No problem according to the documentation: plug the device in the USB port, Winamp detects it, and displays its contents in the Portables view. Needless to say, this did not happened. After much research and trail and error, the solution turned out to be rather simple: Plug the device in the USB port. Then, under Preferences -> Plug-ins -> Portables -> Nullsoft USB Device Plug-in -> Configure select the drive letter of the USB device. If necessary, select to unblock the device. I'm not sure why I needed to this, but I'm guessing sometime in the past I must have selected to block my device. Now if I can only get this silly error from occurring every time I insert the player in the USB port. At least everything seems to function correctly after pressing Continue to ignore the error.

Monday, November 10, 2008

DB Record Insertion Rate: A Trivial Experiment

I was recently involved in preliminary investigation for a new project that requires (at least what I thought was) a very high rate of record insertion into a database. In a nutshell, the ability to insert a minimum of 200 records a second up to a maximum of 2000 records a second is required. Even though the records are small, at 80 bytes each, spread across 4 fields, I wasn't confident the DB I had at my disposal would meet these requirements. So I ran a quick experiment. My experiment involved writing a Groovy script that was used to insert 17 million records in a Microsoft SQL Server 2000 DB, both running on my laptop. My laptop has an Intel Core 2 Duo (1.8Ghz) CPU and 3 GB of ram. Each record consisted of 120 bytes, which was split into 5 fields when inserted into the DB. The script inserted the records in batches of 1000 using SQL similar to the following (with 'ValueX' replaced by content to fill out the required 120 bytes) [See Note 1]: INSERT INTO Messages (Col1, Col2, Col3, Col4, Col5 SELECT 'Value1' , 'Value2', 'Value3', 'Value4', 'Value5' UNION ALL SELECT 'Value1' , 'Value2', 'Value3', 'Value4', 'Value5' UNION ALL SELECT 'Value1' , 'Value2', 'Value3', 'Value4', 'Value5' UNION ALL SELECT 'Value1' , 'Value2', 'Value3', 'Value4', 'Value5' .... Results:

Approximately 1400 records per second was inserted.
The size of DB did not effect insertion rate; i.e. insertion rate did not reduce as the DB got larger.
Maximum memory usage was about 2 GB.
The DB required about 6GB of disk space (includes index files).

The insertion rate does not meet my upper bound, but is not too bad given the experiment was executed on my underpowered laptop. Note 1 I could not use the more sensible row value constructor syntax [1] INSERT INTO table (column1, [column2, ... ]) VALUES (value1a, [value1b, ...]), (value2a, [value2b, ...]), ... since it is not supported by MS SQL Server 2000. Thank goodness this is supported in SQL Server 2008. References [1] Insert (SQL), Wikipedia

Friday, November 7, 2008

The Why and What of This Blog

Steve Yegge in You Should Write Blogs goes into detail as to why I (and perhaps you) should be blogging. I'm under no illusion that my blog will ever approach the substance, quality, and quantity of Yegge's. That's not my intent. My goals are to:

Document new things I have learned, information obtained, and experiences gained in my day to day work as a software developer.
Write short summaries (the main take away points) of articles I read.
Record anything that is interesting to me, and thus maybe interesting to someone else.
Force me to write to practice my writing skills.
Allow me to contribute something (hopefully useful) to the software development profession.
Give me a chance to rant every now and again, or share my opinion.

It is my hope that writing short blog entries will help me organize my thoughts, help me retain information, and act as a reference source for me (and may be others). The blog will generally be in the form of short notes and summaries, tidbits of software development knowledge if you will. I will not write many essays here. Although I may write off topic sometimes, nearly all blog posts will be related to some aspect of software development. At the present time, I expect the content to be dominated by the more technical aspects of the profession. And if I achieve none of my goals above, at least I have an OpenID account I can use on Stack Overflow

Software Development Tidbits