Home About Me Contact Me Articles Design Patterns Research FAQ
MARTIN HUNTER'S BLOG
Custom Web Services with Microsoft SharePoint 2003
17th JANUARY 2007
I had reason recently to go about registering a custom .NET .asmx Web Service with Microsoft SharePoint Portal 2003. This turned out to be a rollercoaster ride of new experiences and learning; learning that one always welcomes, just not when you've got a 2 hour deadline to achieve! Anyway, it's all done and working well, so I thought I'd impart some of what I'd learned in the form of some "gotchas" in my blog for the future benefit of myself and my colleagues in the industry. There are plenty of good instructions on the overall process of registration elsewhere on the Internet (try links below), so I'm choosing not to go into detail on that here, and will focus instead on some of the major traps.
Background
The reason that you need to register a custom .asmx Web Service with SharePoint 2003 is that SharePoint has some funky functionality that allows your Web Service to be executed in any context in SharePoint (i.e. any web, site, document library etc.), regardless of the fact that it's physically located in just one context (the "_vti_bin" folder of your Portal web site or "virtual server"). For example, your Web Service will actually reside in the Portal web site's "_vti_bin" folder as follows:
http://<server>:<port>/_vti_bin/<webservice>.asmx
But, once correctly registered in SharePoint 2003, your Web Service will be accessible from other contexts, such as:
http://<server>:<port>/sites/MySite1/_vti_bin/<webservice>.asmx, or
http://<server>:<port>/sites/MySite2/_vti_bin/<webservice>.asmx
In order to get this functionality, you need to register the .asmx Web Service in a non-intuitive but very simple way (well, simple once you know how... as I said above, please use the links below for detailed instructions on the process!).
To get an idea of what you're aiming to do in registering your custom .asmx Web Service, I suggest you start by having a look at the default .asmx Web Services that come registered out-of-the-box with SharePoint. This way you can build a picture of what you're trying to get to.
To do this, open the IIS mmc manager and locate the Web Site that you've set up with the SharePoint Portal that you want to register your Web Service with. In this Web Site, you will notice a folder called "_vti_bin", which contains a number of files; including a whole pile of .asmx files, which are the .asmx Web Services that SharePoint has out-of-the-box. For each .asmx file, you will also find two other files, one called "<web service>wsdl.aspx" and one called "<web service>disco.aspx". These are the .aspx files that SharePoint uses to generate the associated Web Service's WSDL and DISCO descriptors when the URLs "http://<server>:<port>/...context.../<webservice>.asmx?wsdl" and "http://<server>:<port>/...context.../<webservice>.asmx?disco" are queried (respectively) from a browser.
If you open one of any of the "<webservice>wsdl.aspx" files in Visual Studio, you'll notice that the majority of the file's content is regular WSDL descriptor for the related Web Service; but, you'll also see that there's some .aspx code too. This code uses the SharePoint libraries to identify the current context in SharePoint of the call, and outputs this context as URLs in the WSDL. This means that when your Web Service is queried at a specific context in SharePoint, that same context can be dynamically associated with the WSDL for the Web Service...simple but clever stuff, eh!
Inside the "_vti_bin" folder, you'll also see that there's a "bin" folder. This is where the assembly .dll of your custom .asmx Web Service will need to be copied to so that SharePoint can load your Web Service class(es) and execute them.
The Gotchas
SharePoint 2003 won't work with .NET code compiled for the .NET Framework version 2.0 or above...the latest .NET version it works with is 1.1. This means that the custom .asmx Web Service that you want to have working in SharePoint 2003 needs to have an assembly .dll compiled to .NET 1.1 or below.
If you try using a .NET 2.0 assembly, you'll find that SharePoint either won't recognise the file is there at all (you will receive "Cannot create object of type [X]" kind of errors), or (if you specify the type in the .asmx in the form "<namespace>.<class>,<assembly>") you'll find that a "Bad image" or "File is of an unrecognised format" kind of errors occur. In the worst case, I've found that trying to install a .NET 2.0 .dll will actually corrupt your Portal site, requiring you to delete it and start again.
The easiest way to create a .NET 1.1 .dll is to build the .asmx Web Service using Visual Studio 2003 (or equivilant). If you use Visual Studio 2005 to develop the Web Service you will need to use either the command-line compiler for .NET 1.1 or the "MSBee" open source add-in for the MSBuild utility.
The Web Service Namespace of your .asmx Web Service needs to have a trailing '/' character. Also make sure that your Web Service Namespaces in the WSDL and DISCO files for your Web Service (<webservice>wsdl.aspx and <webservice>disco.aspx) reference this Namespace correctly. If you don't use a trailing '/' character for the Namespace and/or fail to reference it correctly in the WSDL and DISCO, you'll probably find that your Web Service will return "Value cannot be null" errors when you try to execute it. The reason for these errors is not your Web Service: They're due to the SOAP processors having trouble linking up the called Web Method parameters with the incoming SOAP Message.
You may need to sign the Web Service assembly in .NET and register the assembly .dll that you copy to the "_vti_bin" folder in the SharePoint Portal in the GAC. I've not had to do this myself, but it is often mentioned as a way of sorting out issues with SharePoint 2003 not finding your assembly .dll. I suggest you try this only if your sure you've sorted all issues relating to Gotcha #1.
Once you've successfully registered your custom .asmx Web Service with SharePoint 2003, and you want to consume the Web Service in a Visual Studio project, you will need to add a Web Reference to the .asmx Web Service. Make sure that you add the Web Reference to the Web Service in the context in SharePoint that you want your Web Service to execute in.
If you add the Web Reference to your project in the context that the Web Service sits physically (i.e. "http://<server>:<port>/_vti_bin/<webservice>. asmx"), you will find that your Web Service will probably not work correctly (though it will execute without exception). You'll need instead to add the Web Reference to the Web Service where it is to be executed (for example, "http://<server>:<port>/sites/MySite/_vti_bin/<webservice>.asmx") even though this is not where the .asmx file resides physically.
Make sure that the SharePoint Portal site that you've created is in the Trusted Sites list of your browser (e.g. Internet Explorer or FireFox), and make sure that the security settings for these sites is low. SharePoint pages use a considerable amount of script code which, if unable to run for security reasons, will mean that various textboxes and lists won't appear when they should. There are no errors reported by SharePoint when the scripts cannot run; it just doesn't display the fields and leaves you wondering why not. Be warned!
Links
A set of instructions for registration of .asmx Web Services with SharePoint 2003 exists on the MSDN Web Site:
Writing Custom Web Services for SharePoint Products and Technologies
I've also found the following instructions helpful, which go into a bit more detail on problem resolution:
MDSD using DSLs and GPLs
12th DECEMBER 2006
An interesting FAQ white paper from Microsoft regarding Visual Studio DSLs (Domain Specific Languages) resides at http://msdn2.microsoft.com/en-us/library/ms379623(vs.80).aspx. The article describes the route Microsoft have taken with Visual Studio DSLs, why they've taken this approach and how the Visual Studio DSL approach differs from that of the OMG's MDA (Model Driven Architecture) approach.
One of the things that intrigues me is, in general, how much controversy exists around Microsoft's DSL approach, and whether this is consistent with, and (often) "better or worse than", the OMG's MDA approach. I think that two things are for certain: Both approaches are similar in that they attempt to improve productivity in the production of software systems through Model-Driven Software Development (MDSD), and both are also very different in the way tackle this. I'm not sure that the two are necessarily incompatible, or inconsistent, better or worse, just that they aim to enhance productivity through MDSD, and they do this in different ways.
Microsoft recognise, as do many in the IT industry today, that GPLs (General Purpose Languages) such as the UML are not great at expressing domain-specific structures. The OMG have reiterated the UML specification on numerous occasions for exactly this reason; as the industry have identified new modelling needs so the UML has been extended. Many have commented on the fact that the UML is, today, very bloated as a result, and is sometimes overly complicated. In addition, regular practitioners of the UML comment (and, as one, I all-too-often find) that the language facilities for stereotyping are very heavily used to extend the basic UML meta-model where it isn't good enough. But, baring all that in mind, the UML is still the most pervasive modelling language in the IT industry, and is well known and understood by many, many people. In the white paper referenced above, the authors indicate that Microsoft very often use the UML; as do many of us.
It's precisely because the UML is a GPL that it isn't good at expressing domain-specific structures. DSLs (Domain Specific Languages) are, on the other hand, better at doing this precisely because they are by definition designed to express structures in a specific domain. Conversely, DSLs are not good at expressing structures outside the domain to which they relate. GPLs and DSLs are, therefore, targeted at solving different kinds of modelling problems. And, depending on what you're wanting to do, one or other approach is probably going to be better in your context.
And here in lies, what I think is, the vacuous controversy around Microsoft's approach in implementation of MDSD with DSL's and the OMG's approach using the UML. Microsoft have taken the very pragmatic route that most software developers (and I underline this word for good reason) will benefit from the use of MDSD at quite a low level; models are about gaining productivity in generating program code that would otherwise have to be hand-written by developers, and developers can use modelling tools to help them do their job. The marketing from Microsoft often mentions (if not focuses) on how they would like to pull developers out of the dark-ages and show them the light through the benefits of abstraction and modelling. Microsoft's Visual Studio DSLs are aimed directly at this: "How can we write a DSL that allows developers to generate code quickly using a model?"; for to do this would be to persuade developers that modelling is a worth while exercise. (As an aside, I'm not sure which developers the marketers' are talking about here, as most of the developers I've worked with over the years, particularly those of the Java fraternity, have been very savvy to the benefits of modelling and are fully proficient in things like the UML and MDSD). Take a look at the out-of-the-box designers (such as the Class Designer) that come with Visual Studio 2005, and you quickly realise that there is a high degree of coupling between the model that you produce using a designer and the designer's generation of code behind the scenes. If you download and use the Microsoft DSL Tools to build your own DSL, you will notice that there is a significant emphasis on producing program code.
Microsoft's DSL technology is platform dependant: targeted specifically at the .NET platform and integration with Visual Studio 2005. It's all about writing good quality .NET code quickly and accurately. The OMG's MDA approach, conversely, is more about platform independence. How can we model a system in abstract terms (what MDA calls the PIM (Platform Independent Model)), and then transform this into other models with increasingly higher levels of platform dependency (what MDA calls PSMs (Platform Specific Models)) until we reach executable program code? The principle behind MDA is that, once you have an abstract PIM, you can transform this to PSMs for system databases, program code, user interfaces and so on. The complexity of platform dependence is added by these transforms and, ultimately, actual database creation scripts, and (for example) C# or Java code for Unix or Windows platforms are produced, themselves a model at the end of the PIM to PSM transformation line.
The article referred to above suggests that, despite the prevalence of the UML, not much has been done with regards code generation from UML models. I'm not sure that this is correct, as I (and quite a few other people I know) have been using UML to generate code for at least the last five years, using tools such as SparxSystems Enterprise Architect amongst others. However, I think the article has a point in that early MDSD tools have not accomplished much of what they set out to, and that this relates somewhat to the lack of domain-"specificness" surrounding languages for models (GPLs such as the UML). In my understanding, this is why Microsoft have approached MDSD with DSLs rather than GPLs.
However, in using a DSL you're inherently loosing the level of abstraction that a GPL like the UML offers. The abstraction gives you at least two benefits: Firstly, GPLs like the UML are both standard and useful across many domains, which means more people are likely to use and understand them. The larger the user base, the more beneficial is a langauge for communicating and expressing ideas. Secondly, GPLs tend to be abstracted from specific technology implementations, which means that a single GPL-based model is meaningful and can be used with many different technology implementations and scenarios.
So, as with many things, it's horses-for-courses. I think Microsoft's route is to bring power to the people: Improve developer's productivity with modelling tools that really help them cut code. This is a very pragmatic, down-to-earth, "what happens on the development-room floor" sort of approach. It's also specific to a single platform (.NET) and heavily tied with the aim of producing code. The OMG approach tries to pull the modeller away from the detail of implementation, to an abstract world, in a bid to allow them to think purely on the problem in hand, rather than on the specifics of technology. Both approaches are good, usable and (in my mind at least) just different options on the road to producing a solution.
The Specification Pattern
27th NOVEMBER 2006
It's been a while since I last blogged. Both work and play have been manic! But what better subject to get blogging again than an interesting and powerful design pattern? Well, in researching some ideas for implementing a business rules processing engine, I recently came across the Specification pattern. I've documented an example in C# (let me know if you need one in Java) and made a comment on the pattern here.
Excellent Acronyms
25th SEPTEMBER 2006
I colleague recently introduced me to some great acronyms that relate to practical development ethics. I thought these quite amusing, so here they are:
...and I've just thought up one of my own:
How about some standards for tender processes?
18th SEPTEMBER 2006
During my career in IT, I've most often worked on projects that start with some kind of formal tender process. Putting aside, for one moment, projects that come about under the auspices of a high-level framework agreement or contract, often the first thing that an IT services vendor will see of a project will be some kind of tender documentation created by a potential customer. This documentation will usually put forward a description of what is required of a product and vendor together with the business and technical context and constraints for the product and/or project.
Tender documentation comes in all forms and sizes: From the very large to the very small; the impossibly vague to the overly prescriptive. The breadth, depth and granularity of information is rarely the same between tenders and is commonly insufficient to convey enough of an understanding to avoid making at least a few assumptions with significant impact on the project and/or product. In the main, quality-of-information risks and issues of this nature are mitigated or overcome in a tender process through thorough questioning sessions and robust conversations between vendors and customers designed to clarify requirements and scope, and through the thorough documentation of questions and assumptions made in assembling tender responses.
If left unmanaged, however, it would seem to me that issues relating to the variable quality of information in tender documentation can have significant impacts:
In response to the challenges brought about by the variety of content and quality of tender documentation, I would like to see the adoption of standards by all parties in a tender processes (vendors and customers alike) to help direct the flow of information and to alleviate issues and risks born out of misunderstanding, misinterpretation, and the unrealistic vendor and customer expectations that follow. These standards would need to consider both the information content of tender documentation and responses, together with the processes followed in assembling that information, to ensure that the message conveyed was as full and as unambiguous as possible.
I could be replaced by a child of BizTalk
9th AUGUST 2006
Throughout the history of IT, the computing industry has found ways to make programming easier and less time consuming by adding layers of abstraction: Ones and naughts were replaced with Assembler, Assembler with 3rd generation languages such as C. In the last 50 years, we've seen software technologies develop from the monolithic, to the procedural to the Object Oriented and beyond. Microsoft's BizTalk Server (and similar technologies) take things one step further by allowing developers to graphically assemble workflow orchestrations. These take us one run higher on the ladder of abstraction and deliver capability to model and work closely with business logic and commercial realities, without worrying so much about technology. BizTalk orchestrations can be functionally decomposed (via the "CallOrchestration" shape): It is not much of a stretch of the imagination to see how future iterations of BizTalk might adopt OO principles and techniques in this context.
Component-ware in general appears to offer significant development productivity gains over traditional development techniques and projects. With the advent of interoperability and communication standards and frameworks (such as XML, SOAP and WS-I standards, SOA, ESB and so on) the potency and availability of such solutions can only be enhanced.
With gains such as these, one might project that the jobs of our current IT development work force may be radically different in 10 or 20 years time. Indeed, the solutions and project methodologies of 10 or 20 years ago are considerably different to those developed and used today. In future we may not need the same number of development staff to deliver fully featured software with a higher degree of accuracy, reliability and in less time than is currently possible. An army of Java or .NET developers may well be replaced by an assembly of component-ware derivatives: The children of products like Microsoft BizTalk server. Developers of today may be working with graphical modelling tools, executable model frameworks and standards-based commercial products to assmble the business solutions of tomorrow.
On A Soap Box...Microsoft Application Architectures
8th AUGUST 2006
I've recently been using a number of technologies from Microsoft to prototype a solution architecture. We've needed to prove principles, identify strengths and weaknesses and so on for a solution architecture that will shortly be under development.
I've worked with Microsoft technologies since the mid 1990s. One thing that has not changed much in that time is how highly dependent many of Microsoft's technologies are on each other. In fact, the situation seems in some cases to have become worse recently. BizTalk, for example, is dependant on SharePoint, SQL Server and Visual Studio. It's not that BizTalk won't function correctly without these, it's that it won't function at all. If you want BizTalk Business Activity Monitoring (BAM) you also need to add Excel to this list. Dependencies such as these are prevalent amongst Microsoft solution architectures and often have the effect of dramatically inflating the license cost of a solution.
So, what happened to the core architectural principles of low-coupling and high-cohesion? Whilst Microsoft's core technologies have come along leaps and bounds in recent years with regard to their adoption of open and structurally sound architectures, why do we still need SQL Server for BizTalk to operate? Indeed, why do we need to install SQL Server before BizTalk? Why do I need to install SQL Server before Visual Studio? Surely the SQL (and I mean the actual Structured Query Language here, not Microsoft SQL Server; for some the term appears to be synonymous) and related technologies are intended to promote interoperability, low coupling and high cohesion? Not to demote these.
Microsoft have shown signs of promoting interoperability between systems recently with their introduction of XML as a base format for Office 2007 documents. It seems Microsoft are catching up with office systems such as OpenOffice.org in this regard, which have been using standard XML and .zip files as a document format since the year dot. Let's hope the good work is kept up!
Testing Assumptions
31st JULY 2006
I was watching a documentary on Sky's National Geographic channel recently that investigated how an Airbus A310 had tragically crashed in the 1990s, killing all on board. I'll spare the gruesome details. Suffice to say that the cause was reported as being, in part, related to the assumptions that the crew, who had been used to flying aircraft from different manufacturers, had made about the operation of the aircraft. In particular, it was reported that they had expected an alarm to alert them by producing a sound when in fact it didn't: It was a red light. They story got me thinking about how important it is that the assumptions we make in the course of our duties, whatever they may be and whatever the consequences, are properly identified and tested. The documentary claimed that training was the issue. Indeed, it seems that a simple bit of information regarding the aircraft may have averted disaster and that this could be conveyed with training. However, if we rely on the comprehensiveness of training, would we ever catch everything that might happen in our jobs? I would argue not. Whilst training is fundamental to the successful operation of our duties, it can't teach us everything. We need to supplement this with an ability and willingness to think, to spot our assumptions and test them in order to better understand the situation. Now, this may be hard for people operating in the heat of a moment. But, in general, most of don't operate in such conditions and we do have the time and wherewithal to identify and test our assumptions before acting.
A little while ago I wrote a paper on finding assumptions in relation to my own job in IT. You can find it here.
Rich Web-Client Framework beta 4.0 released
26th JULY 2006
The RWF beta 4.0 is now available for download here. A white paper covering the RWF architecture and programming techniques will be added to the articles section of this web site shortly.
Fowler retires the Model / View / Presenter pattern
24th JULY 2006
Please note that Martin Fowler has retired the MVP pattern from his web site. Martin has split the pattern into two others: Supervising Controller and Passive View. See MVP for further information.
Model / View / Presenter / Controller Pattern paper
14th JULY 2006
I've released a paper which looks at the the MVPC design pattern. MVPC is an extension of the MVP design pattern in which we attempt to abstract the presentation logic relating to the objectives of the presentation (the "why"), from the code associated with the presentation framework used in implementaiton (the "how") such as Microsoft .NET WinForms or Java Swing. You can find the paper here.
Methodological Patterns
12th JULY 2006
I sometimes come across scenarios in my role as a Solution Architect in which projects have established architectural designs, and may even have started development work on a "solution", when a clear understanding of the purpose of the project has not been formed. Requirements are not properly understood before work on producing something begins. In my experience, projects like these often fail, sometimes catastrophically. It stands to reason that if you don't know what you're aiming for, then your not likely to hit the target. It's common sense. So why, then, do we continue to participate in such projects? Why do organisations willingly invest thousands, if not millions, of dollars of hard cash in trying to realise something that's still a glint in the eye? And how, importantly, can we learn to avoid such situations in future?
As a result of poor implementation practices, even when the requirements are very well known and understood by all, the IT industry has over the years established a series of project methodologies which aim to allow organisations to follow a structured and proven approach to identifying needs, defining requirements, and implementing, deploying, maintaining, operating and retiring solutions. Some methodologies focus on small parts of the overall life of a project, whilst others try to tackle the project lifecycle from end to end. The Rational Unified Process (RUP), for example, defines phases (Inception, Elaboration, Construction and Transition) which tackle identification through to deployment, but not continued operational aspects. As far as RUP is concerned, the end of the project comes when the business receives the product, not when the product is retired. To alleviate this, Scott W. Ambler ellaborated the RUP with his Enterprise Unified Process (EUP). This extends the RUP by defining production (operational) and retirement project phases.
However, many of today's popular methodologies (such as XP, MSF and the RUP) still focus largely on problem definition, solution development and deployment. They don't take a balanced approach to the full life-span of the project or product. In many respects, this may be due to the fact that many software development vendors are hired only to develop a system, not to manage its life from end-to-end. The market for solution development-focused methodologies may therefore be larger than for end-to-end methodologies, simply because project scope is often development-focused and lies along contractual boundaries between vendors and customers.Either way, when you take stock of the big picture, it strikes me that we may have an incomplete set of methodological tools when it comes to running projects. How can we generalise the practices in the methodologies we use to a form which is abstracted from those particular methodologies? Doing so would mean that we could consider different aspects (phases, deliverables, processes etc.) of projects independently, and configure a methodological solution to ensure that we meet all of the important or critical aspects of a project. Can we abstract the principles of the well-proven practices in RUP that identify the business case for a project and define requirements adequately? We could then combine these with abstracted principles from other proven practices for the operation and retirement of a product, in a logical, structured and coherent manner. Can we identify and define a set of methodological patterns that will ensure that a project works well and is successful?
Returning to our earlier observation (in which we cited the practice of embarking on architectural decision making and development before requirements are properly identified and understood), can we author a pattern which will identify and define the scenario in an abstract and standard way? Can we author a corresponding anti-pattern, so that we can understand the scenario and spot it early before it reeks havoc on our project?
Because patterns such as these would be patterns in process, we might want to include a set of pre- and post-conditions, together with process documentation, milestones (points that need to be met along the way) and so on in the pattern definition. In RUP, the finalisation of the identification of business need is met by the publishing of a Vision document. The finalisation of business requirements understanding is met by a complete set of Business Use Cases, and of system requirements by System Use Cases. The completion of an Architectural Design (and prototype) then establishes the readiness of the project to begin work on developing the solution. These are all phases in the RUP project lifecycle, and are defined by pre- and post-conditions together with information about what needs to happen in between.
Now, you might be thinking "but, we have agile patterns which dictate methodological practice". It's true, we do. However, agile patterns focus on patterns of behaviour that ensure or seek adoption of best-practices in development teams . Though this is part of the story, it's not everything. Indeed, what we need to do is document patterns like these on a scale which encompasses the best practices of many more aspects of IT projects than just development.
So here's an idea for a name for a pattern relating to the best practice that good requirements need to come before architectural decisions can be made: "The Horse Comes Before the Cart". I think I'll put an article together to propose the pattern, and others, shortly!
For now, here are a few that I've thought of with a brief description of what they mean:
| "Technology for Technology's Sake" (anti-pattern) |
Don't be coerced into using technologies just because they're new and fabulous when they don't give you any real business benefit. Organisations spend a great deal of money each year undertaking "technology upgrade" projects that deliver nothing more than what they had already, and sometimes less. Such projects are often started with irrational fears relating to existing technologies being retired from support by vendors. Before you engage in an upgrade, ask yourself these things:
|
| "Appropriate Skills" (pattern) |
Effective people will always overcome ineffective people, processes and tools. Any project that hopes to be successful needs an appropriately skilled, integrated and well-oiled team. |
| "A Bird in the Hand" (pattern) |
Leverage proven technology. What can you re-use from your existing, proven technology set in your new solution? You've already invested in these and developed them to a point where they are stable and fit nicely with the business. Don't blow them out because you've dubbed them "legacy" systems. I've seen companies spend millions replacing entire tracts of core system when all that was required was a new Graphical User Interface. |
| "Horses for Courses" (pattern) |
Recognise that tools, processes and people are good at some things and bad at others. Don't use a technology just because it's new when, in reality, it isn't appropriate for the situation. |
| "The Horse Comes Before the Cart" (pattern) |
Make sure you know fully what you want to do before you engage a project to do it. |
| "Divide and Conquer" (pattern) |
Break the problems up into manageable chunks. Methodologies like RUP deal with vertical (time based iterations) and horizontal (modularisation of tasks and product components) chunks of project. |
| "Feedback Loop" (pattern) |
To keep a tight rein, you need to be regularly and frequently updated with feedback on project progress in terms of input, output and quality. Again, RUP and other methodologies like XP use an iterative approach to allow adequate feedback to occur and influence the course of the project. |
Forthcoming changes to the Authoring Efficient XML presentation
12th JULY 2006
I received some great feedback regarding my Web Presentation titled Authoring Efficient XML last night. I'll be considering the comments shortly and will probably update the presentation over the next few days.
Code Generation Tools, MDA and DSLs
9th JULY 2006
A plethora of code generation tools have appeared in recent years, each of which promises hikes in productivity and output. And yet none of them ever seem to suit the purposes I have for them. How useful are these tools really?
In general, code generators seem to fall into three camps (documented here in order of maturity):
Of these, model driven generators certainly seem the most flexible. Importantly, they do not prescribe an architecture. As such, they can be used to generate code in situations where the prescribed architectures of other generators are not appropriate (which I find is often the case!). Many model driven generators also allow the end user to express the model using the UML, which is helpful as it means we don't need to learn a new modelling language before we can start taking advantage of code generation in projects.
However, it strikes me that code generation can only take us part of the way when it comes to producing a working solution. This is largely because the modelling tools that we use (e.g. the UML) are not sufficient to express all of the information required to achieve complete generation. 100% generation of the code base of a solution would mean that all of the rules and logic normally captured in code (Java, C# etc.) would instead need to be captured in a model. Though things are gradually changing in this arena, it is a significant challenge to produce a modelling language which allows this level of information to be captured, that is platform-independent and doesn't resemble a third-generation programming language either in complexity of syntax.
The Object Management Group (OMG) have responded to this challenge by developing Model Driven Architectures (MDA), a technology that uses successive model transforms to take a highly abstracted and simplified model and, with each transform, add platform implementation information to the point eventually where program code can be generated. The abstract model is known as the Platform Independent Model (PIM). This is transformed into a Platform Specific Model (PSM), which can, in turn, be transformed into program code. A transform applies logic and information not captured in the source model to create another model which is more "specific" to the purpose of that transform. Thus, a PIM which is devoid of any detail relating to which implementation platform is to be used (for example, Java or .NET) is transformed into a PSM which does contain that information (or, at least, is structured accordingly). Similarly, the PSM can then be transformed into working C# code and/or into a database schema for Microsoft SQL Server or DB2. The great thing about this is that a variety of platforms can be delivered to simply by modifying the transformation logic rather than the domain logic (i.e. the abstract model). It doesn't matter whether you're using Oracle or DB2, .NET or Java because it's not hard or costly to generate code for all of these platforms from the same source model. On the downside, MDA has been around now for a number of years, and unsurprisingly people have found that it is not an insignificant task to produce tools that do what MDA aims to achieve.
Recently, Microsoft have introduced Domain-Specific Languages (DSLs) to their latest development offering (Visual Studio 2005), which tries to tackle the problem by allowing end-users to design their own languages for their own purposes. Microsoft argue that the domains of different user groups are so disparate and varied that no one modelling language can suit all of the purposes out there. Microsoft's DSL technology provides users a modelling language for expressing modelling languages, thus positioning the burden of capturing appropriate information squarely on the shoulders of the user. With appropriate transform templates, it seems to me that this mechanism would allow 100% code generation. The only problem is that it will take considerable time and effort to establish and standardise new DSLs to the point that they are actually useful, because each "domain" (e.g. industry, business) is undoubtedly not defined or understood well enough currently. Members of many industrial sectors are still fighting amongst themselves today about XML schema standards, even though XML has now been in common use for a decade. I cannot see that it will be any better for DSLs, which are far more complex as an idea for the average guy (and committee) to deal with than hierarchical data sets!
Spotting the "simple things" in implementing features with XP
4th JULY 2006
Practitioners of Extreme Programming (XP) are encouraged to use “the simplest thing that could possibly work” in providing a solution to a requirement or implementation of a feature. This is a noble aim with which I agree entirely. However, I think that the principle can get a little misconstrued by developers occasionally, and I’d like to take stock of what it really means.
I worked recently on an XP project with a group of developers, all of whom had different ideas about how different features should be developed (as is usually the case!). A particular issue arose around how a group of functionally similar Forms should be implemented. One idea that was tabled as being in accordance with the “simplest thing” principle was that each form should be functionally autonomous unit, each inheriting from an abstract Form which contained common interface and implementation detail. Another idea was to develop each Form as an entirely independent unit, such that the inheritance structure was removed, thereby simplifying the implementation further.
Both were solutions to the problem in hand. However, I would argue that a third, somewhat less obvious, way exists which is in fact more simple than both. That is, to implement just one Form which composes itself slightly differently depending on its state (i.e. which of the actual Forms it needs to be).
Lets have a look at some class diagrams to compare the models.
Implementation using Independent Forms
Implementation using Inherited Forms
Implementation using Composition
We can see from the above diagrams that the Independent Forms model looks simpler because there are fewer classes involved. In terms of development and use, however, the model is the least simple of the three: Each of the CommonOp methods have to be implemented and maintained separately even though they are duplicates of one another. The Forms are also not interchangeable (they are not derived from a common type) so the code that uses them needs to be aware of what they’re about and which Form to create and use in different scenarios.
Though the Inherited Forms model uses more classes, it is in fact simpler to develop, maintain and use since the CommonOp method is implemented in one place. The duplication of methods the in previous model is removed. However, though interchangeable, inheritance builds a degree of inflexibility into the model which might better be solved using composition.
Let’s say you wanted to instantiate an instance of ConcreteFormA in the Inherited Forms model. You would need to write some code which declared a variable of type AbstractForm (or ConcreteFormA) and then constructed it to create a new ConcreteFormA object. This means you need to write some code that knows which type of Form to create. To do this you may decide to implement the Factory pattern, which would in itself require maybe another two or three classes. Either way, the code that decides is not simple.
In comparison, the Composite Form model uses just one actual type of Form, called “Form”. An indicator about what state the Form needs to use (akin to which type of ConcreteForm needs to be created in the previous model) is passed to the Form on construction. This means that the inheritance relationship of the previous model is being abstracted out into data. The code that creates and uses the Form is therefore simpler to develop, maintain and use because the number of actual Form types it is dealing with is much reduced.
In using composition rather than inheritance, a fourth option presents itself: Using configuration to abstract out the differences in the various states for the Form. In this way only one State class need be implemented, yielding benefits in simplicity of development, maintenance and use. The model is also far more extensible, since entirely new Form states can be implemented without writing a line of new code.
Implementation using Configured Composition
With the Configured Composition model we are back to dealing with just three types (two classes and an interface), as is the case with the Independent Forms model (three classes). However, the structure of the Configured Composition yeilds a better seperation of concerns than with Independent Forms. Moreover, as the number of Forms to be implemented increases (Forms D, E, F and so on), the amount of code written does not increase with Configured Composition as it would need to with the Independent Forms model.
In summary, developers need to be careful when they consider what is simple and what isn’t. Though the Independent Forms model may be simpler to understand at first glance, it isn't the simplest solution when compared to some alternatives. Developers certainly need to consider and implement what is simple, but they need to be aware that the simplest option is not necessarily the most obvious and that simplicity needs to be considered in the wider context of the system.
Process v Productivity?
4th JULY 2006
There's a great quote from David Brent (he of the BBC series "The Office") that reads "Process and Procedure are the last hiding place of people without the wit and wisdom to do their job properly". It got me thinking about process balanced against productivity. A degree of process helps improve productivity. Examples might be defining job roles well enough so that people can interact effectively, ensuring that outputs and deliverables are tested so that they deliver ok, ensuring things get done in the right order so as to minimise rework and so on. But, as the degree of process applied increases its demands on an individual's time, that individual achieves less absolutely and is therefore less productive. In the limit, when process takes 100% of an individual's time, process is such an overhead so as to inhibit all productivity.
Here's a simple graphical representation of the relationship. As process increases, productivity increases intially, and then declines. The productivity curve is therefore concave to the process axis.

The productivity curve goes through 0 (productivity = 0) when process = 0 since we assume that no productivity can be achieved if no coherent process is followed (documented, formal or otherwise). No work could be done because no process is in place to achieve it. There also will come a point where process is so high as to inhibit productivity completely (productivity = 0). Note also that there is a level of process (Pr*) achieving an optimum level of productivity (Pd*). I'm not sure what this really means for life on planet earth, but I guess I can go away with a more formal appreciation of the popular generalisation that some process is required but too much is not good!
An interesting read on SOA models...
4th JULY 2006
I have just read an interesting article by Udi Dahan in issue 8 of the Architecture Journal regarding Autonomous Services and Enterprise Entity Aggregation. Udi refers to the high degree of coupling between services in many modern SOAs (i.e. that one service might functionally depend on another) and that, as far as service consumption goes, this can cause issues. If Service A depends on Service B, then when Service A is consumed a whole plethora of things can impact the performance of Service A which are outside the control of Service A (such the communications network between Service A and B and how Service B performs in itself).
Udi suggests that using aysnc communications between services on a publish-subscribe pattern can alleviate this problem. Imagine if Service B published changes to Service A. Service A would then take what it wanted from this data and would store it locally. When consumed, Service A would then not be (real time) dependent on Service B. Udi defines this relationship as autonomous.
I can see this working nicely. However:
I guess it's horses for courses - just like anything in the world of IT!
Thinking about writing an article on authoring efficient XML!
3rd JULY 2006
An idea sprang to mind today about authoring an article about how to write XML efficiently. A number of aspects could be considered such as writing efficient XML in terms of parsing and in terms of minimising the number of text characters required to express a data structure. Rules might include: