Saturday, October 27, 2007

Microsoft's Occasionally Connected Smart Clients

There are two broad strategies for designing smart client communications: service-oriented and data-centric. When you have determined which of these to use, you need to make some fundamental design decisions to allow your smart clients to work offline. In most cases, the clients should be designed to use asynchronous communication and simple network interactions. Clients will need to cache data for use when offline, and you will need a method to handle data and business rule conflicts when the clients go back online.
[From Chapter 4 - Occasionally Connected Smart Clients]
This is a good overview for what I'm going to handle. Microsoft wants its PDA's, and other Smart Clients to be able to perform tasks while disconnected from the server. I want regular Web Applications to the the same, so I'm making the client browser just lie Microsoft's Smart Client. Let's overview their main points:
  1. Approach: MS distinguishes a Data-centric and a Server-oriented approach to disconnected systems. The following graphic (taken from their article) illustrates that while data-centric approaches are not scalable due to their local replication of the entire database, the service-oriented approach also suffers from the lack of business logic (thus relying on the local cache without letting the server know it is not being contacted. We can envision server updates to this interface and/or business logic with painful consequences for the client... My approach is a mix between those two, for I want to design a client that is able to fetch a server module with both UI and business logic, and the necessary data for it to work. However, because we want the application to update the workflow, all client actions will still be postponed to the time that a server is reachable.
    Data-centric and Service-oriented approaches
  2. Conflict types: I've previously detected that all conflicts aren't the same. In fact, whenever someone edits a previously existing data (e.g., a student changing his email address), there is a simple data conflict, as the information on the client differs from the server's version. But this is a different case from an edition to a already deleted data (e.g., the same student tries to answer a question in a quiz that has already been removed by the teacher). MS article refers to the first case as Data Conflicts, and to the second as Business Rules Conflicts. Note that data conflicts may not be trivial to reconcile - consider a situation where two different entities change the same information, one offline, another online. Both want to change it, only one will succeed at first, but the offline client may need to merge its data, after having it sorted out with the other user.
  3. Locking mechanism: while a pessimistic approach would minimize all sorts of conflicts, it is obviously not scalable to a widely shared data-set. So with an optimistic approach comes the need to synchronize and reconcile conflicts.
  4. Reconciliation types: MS assumes the same three locations to do the reconciliation I've detected so far. If the conflict can be automatically merged, it will take place on the server. If not, it may take place on the client (with a simple user interaction dialog), or it may require the launch of another task that will contact other parties (e.g., two students posting the right answer to a question, needing the teacher input to opt for one of them).

The article further discusses CRUD operations as the ideal services to have in these situations (i.e., complex services would make client-side data synchronization harder and harder), and propose some strategies to handle task-based functionality dependencies.

While this is the closer I've read so far to what I want to do for my PhD, it's still lagged behind, as I propose a methodology to adapt existing applications (e.g., pick up existing web services and build from there), and I propose a conflict - and resolution mechanism - description that seamlessly integrates with the remainder application.

Thursday, September 27, 2007

QUATIC07 - the humanistic version of the story

On the last post I mentioned my findings from QUATIC07, and specifically from the input received from the software engineering doctoral consortium workshop. Now it's time to clear out what I learned from the whole experience.



Let's start this thing to mention this is my second event of this sort. I've attended to CAPSI 05, to present a paper entitled "Desduplicação sobre um conjunto de nomes próprios" (in english, "Duplicate removal in a set of first names"). The paper was presented by Helena Galhardas, that despite being the third author - my teacher of decision support systems at the time - she had both the recognition and the experience to make a proper presentation without the beginner's jitters...). So, despite having a published paper, I hadn't present any, so far. Enter the QUATIC event. I'm now faced with the opportunity to show my PhD expectations to other students and teachers of the same area. And present a poster to the entire academia/industrial audience on the conference's hallways. And have my thesis proposal published as a paper on a IEEE proceedings! I must say I was pretty excited about the whole deal!



Day one. I arrived at the conference to present the thesis to the doctoral workshop, SEDES. The conference was in portuguese, and the first interesting fact I noticed was the geographical tendencies of the participants: I was the only one who lived/studied/worked below the northern part of portugal. The rest were all from Coimbra, Aveiro, Oporto, so Lisbon (!) was a strange city among them :) But everyone was really looking for a chance to get feedback on their work, so there wasn't much time nor motivation for picking around that! Well, after receiving my reviews, I met personally one of the reviewers, and found out he is going to be part of my PhD jury. I had lunch with all of them, and proceeded to the remainder of the thesis discussions. And that's about it.



Day 2 and 3. The poster. I placed the poster on a wall in the hallway, and got a nice desk to sit at. This way, I got to show some supporting images to some enquiring passerby. I made an english 10 minute presentation about it, to the industrial world, and got some positive feedback on the subject (the technical questions were a bit far from what I was expecting; the academia is surely more prepared to criticize my work, as it is still a novelty to the others). On the last day I got to see a teaching perspective on software engineering. Miller's presentation about TSP and PSP were most enlightening and entertaining. The day ended with a dinner in a Brazilian restaurant with live music. I got to mingle a bit with people I had met but rarely had the chance to talk to. It was fun, but I have to admit I was pretty tired. I went to bed and slept for a good 7 hours, before waking up to finish another presentation (7 hours seemed a lot, as I had been sleeping from 3 to 5 each night until then!)



Last thoughts. I had a great time, the preparation was quite helpful in organizing my mind schemes, and the feedback was the cherry on the top of the cake. But now I need to evolve my work so that I can go to a IEEE well known conference - not that this wasn't a good one, but I need some heavyweight recognition when I'm facing a fulminating jury! Advice for the PhD newcomers? Don't miss out opportunities like this! It can be expensive, obviously, but there are organizations able to fund your research and your publications and presentations!



Going back to Emacs (is there any other TeX editor worth reverting to, when you're an emacsen?)






Sunday, September 16, 2007

QUATIC07 - the academical side

I've attended the last edition of the QUATIC conference, a three-day event on Universidade Nova de Lisboa, in Portugal. My main motivation was to participate on SEDES, a satellite workshop on Software Engineering PhD students. The purpose was to discuss ongoing PhD thesis and receive early feedback on them - while checking out other possibly related ideas and suggestions from the other participants.


I presented my PhD proposal, as it was by the end of July of the present year. It was a good experience, I received lots of appraisal for my chosen thematic, and also lots of critiques on the details of the thesis, my way of explaining things and my presentation. So lets start from the beginning, go straight to the end, then stop! :) (bored readers be warned, this is going to be a long post!)


My PhD thesis proposal is entitled "Offline execution in workflow-enabled Web applications". My goal started a bit broader, at a time where AJAX was starting to make its entrance in the development show-biz, I wanted to make desktop-like Web applications. Soon enough I realized I was foolish, because tens (hundreds?) of tools were immediately born to make Web Apps. look pretty, with lots of drag and drops, image overlaps, you name it. When something like Yahoo! releases their widgets, I can confidently stop trying to do something like that - I'm one student, they are one fully employed - read "payed!" - set of experienced developers. So I refined my goal and picked a real request - to make a Web Application run offline, without requiring external applications and without loosing any work. To do this I identified three main areas, or sub-problems:



  1. Workflow decomposition into offline performable modules;


  2. Offline client generation, and its functionality


  3. Offline work reconciliation with the server


And this was what I had in mind at the time of the short paper submission. I was also invited to present a poster on how my work would affect the industry, with a short talk within the main QUATIC conference. That, off-course, set the stage for more interaction, as my poster was on a mandatory corridor to all participants ;)


Later on, I collected a set of information (from both research and opinions on the conference) that made my thesis improve itself. I'll try to present them here, as succinctly as I can:



  • Workflow decomposition is a relatively known problem, and people like van der Aalst have been working on Colored Petri-Nets (CPN) to work it out. The problem is now reduced to the question "How do I tell if I can perform this activity if I go offline now?". This is not as simple as it may seem, as I must ensure that there is a possible execution flow that won't require other entities online interaction. I also must ensure that I can bring all required data offline, to my laptop - and issues like access control or the amount of information arise here.


  • Offline persistency on a browser is now the purpose of tools such as Google Gears. Client-side generation using a high level language is also attacked by tools and frameworks like Morfik, XML11 or Google Web Toolkit. My problem is now to unify the specification of both online and offline tools into one. Later on I might also have to worry about technical details like client application transfer, shrinking, and a workflow-engine client clone, but not for a few months, for sure!


  • I was told the synchronization scheme I was looking for was already done by the nice guys from Redmond. The Occasionally Connected Systems framework is supposed to handle a local database (SQL Server Everywhere) and all the asynchronous messaging it takes to handle eventual internet failure, small period disconnects, etc, and that would include a nice and efficient merge of all offline data. While this may be true, there are a couple of issues that allow me to stay on the innovative side of the research: 1) The framework is intended to run on desktop-based applications. I assume that eventually Silverlight-based applications could be supported (as they grow the run-time virtual machine and bloat them with even more .NET libraries). But I want Web applications to do the same, and that's not been abridged, as far as I can tell. 2) The OCS framework is able to merge database information. But the types of conflicts I'm targeting are the high-level ones. I'm not interested in the last-edition time-stamp that is different. I'm looking for the product that cannot be bought because the store now is set to open from Monday to Thursday and I can only get there on Fridays. To make things clear, I'm not interested in a data merge algorithm, but in the elegant and simple way to integrate the specification of such kinds of conflicts - and their resolution strategies - within the application domain model and workflow definition.


This was a nice set of input. I was pleased. But I had some other things to fix, not so technical but about my approach to the whole PhD:



  • My sales-pitch - I need to fine-tune my goal statement into something that says "I want to describe a specific solution to a specific problem", and not "I want to solve al world's problems in two years, yay!!".


  • I also must get a supportive example. My example has fallen in the field of the non-problem - that was what I got by making it so simple I could explain it in a corridor walk... So the suggestion is to find a concrete requirement in a concrete (aka, real) system, describe the requirement, describe the problems in the current approach and conclude by suggesting how do we would want to do it.


  • I had to change a lot in what I was expecting to produce. Since I'm aiming for a development methodology (and the necessary framework or tools), I had to clearly state the inputs, outputs, all stages, the applicability/restrictions, the target audience, ad so on and so on. It's a shame there is no standards related to methodology description. I could certainly take advantage of one, now!


I'll post later about my personal experience on the conference - something from my human side :), and also to publish a copy of my thesis proposal. Here's how the poster looked like:
QUATIC poster Edgar Gonçalves


By the way, I have to say I consider myself fortunate, as this short paper is now published in a IEEE Proceedings! So, do you have any more advices, or experiences you'd like to share?

Friday, April 20, 2007

RIA and Flash

Last Wednesday I went to a seminar on the application of Flash technologies to produce Rich Internet Applications. Adobe has certainly been busy for the past year, as we are far away from the "banners and intros" main purpose for Flash. Two main names pop-up for discussion: Flex 2 and Apollo. Flex addresses the issue of producing web applications with Flash and ECMA Script (the ActionScript variant), in the same manner one would develop a HTML+Javascript solution. The coding language is named MXML, and allows to mingle the UI and the client-side scripting. The main advantages are the Flash remote communication capabilities (either via web services, flash remoting, and other RPC alternatives) and the Flash multimedia embedding. It's quite easy to add a video playing in a given canvas! As far as coding artifacts are concerned, all we need is at least an .MXML file, like the following:
<?xml version="1.0" encoding="utf-8"?>

<mx:ApolloApplication xmlns:mx="http://www.adobe.com/2006/mxml" layout="vertical" cornerRadius="12" alpha="0.7" borderStyle="none">
    <mx:Script>
        <![CDATA[
            import mx.collections.ArrayCollection;
           
            [Bindable] public var people:ArrayCollection = new ArrayCollection(["one", "two"]);
           
            private function doAddClick(_name:String):void{
                people.addItem(_name);
            }
        ]]>
    </mx:Script>

    <mx:TextInput name="firstname" text="Type your name here"/>
    <mx:Button label="Add it!" click="doAddClick({firstname})"/>
    <mx:ComboBox dataProvider="{people}"></mx:ComboBox>
    <mx:FileSystemTree/>
   
</mx:ApolloApplication>
Yes, this is purely XML. One can add up pure actionscript files, wich, by the way, has some advantages over javascript (it's an OO language, for instance). The main drawback (yes, there is one) is the IDE is not free (unlike the compiler and SDK - those are FOSS! Also, there's no DTD, or XSD available to use in XML development editors, so we will probably have to stick to an Eclipse plugin (paid Flex Builder) until an open source alternative comes around. RIA has been forever tied to the sandbox concept: like all web applications, one can only access files that have been uploaded by the user in the present session. Another restraint is that a browser is needed to run the application. Here comes Adobe's Apollo. Take a Flex application, pass a short application XML descriptor, and with a couple of command line instructions you have a compiled and packed installer to a standalone windows/mac executable with the MXML contents. But they didn't stop there. Adobe also wants to catch up current HTML/JavaScript/AJAX developers, so Apollo also makes identical standalone applications out of pure HTML+JavaScript sites. But that's not all. to our coding pleasure, Apollo lets us access all Apollo's ActionScript facilities from Javascript, and vice-versa! This means one can code a javascript handler to access debug tracing, common dialogs, client-side file system, loke in the following example (taken from Mike Chamber's blog):
//called when button is pressed to select a file
function onFileClick()
{
 //this will trace out the string to the command line
 apollo.trace("hello");
 
 //get a reference to the desktop
 var f = apollo.flash.filesystem.File.desktopDirectory;
 
 //listen for the select event
 f.addEventListener(apollo.flash.events.Event.SELECT, onFileSelect);
 
 //open the browse dialog
 f.browse();

}
Again, the SDK is (yet in alpha state) FOSS, but the IDE consists in an extension to Flex Builder, and thus is paid. But the technology itself seems quite promising. See Adobe Digital Editions for a quite nice real-world example. In short, I'm impressed with Adobe Labs, and it seems there's more to come in the next couple of months!

Thursday, February 15, 2007

Free Lisp Executables in Windows

One of my latest projects has led me to think about making windows executables for a typical database application. The main requirement was using free software to make this project, so here's my starting point:

  • Clisp, 2.41 (2006-10-13) (full, with readline support)

  • ASDF
  • IDE: Emacs and SLIME (latest, from CVS at the date of this post)
The first issue I want to get behind is to make windows executables. If I can't make those, I might as well move to Python - but my programming language preference makes me stubborn about this one. I found out the magical command, ext:saveinitmem. But this alone wouldn't work, and I got to the conclusion SLIME was the problem - the binary image would try to output to slime, and result in an error. The fix is simple enough - just don't load swank, and make the binary without slime. I made a convenience build.bat file, with the following:

c:\dev\clisp\full\lisp.exe -B c:\dev\clisp\full -M c:\dev\clisp\full\lispinit.mem -ansi -norc -q -i build.lisp

(don't forget to change clisp's path to your configuration). One more thing I noted was the dll's clisp uses had to be in the same directory as the produced executable. (find them under "



Then, I put into build.lisp everything I want preloaded. ASDF, my project file loading, etc. and then, the magical function call (assuming an already loaded "test-app" package with a "main" function):

#+:clisp (ext:saveinitmem "test-app"

:init-function #'(lambda ()

(test-app::main))

:NORC t

:script t

:start-package :test-app

:executable t

:quiet t)

All fine so far. The executable is not small, but being about 5MB makes it smaller than the ~30MB I would get with the latest SBCL (1.01) for windows.



Next challenge, the database connection. This was a lot more troublesome than I initially thought. CLSQL was the elected choice, because I needed to access legacy data, initially from an Access .mdb file. I had to create an ODBC Data Source and use the clsql-odbc package to access it. I got this working in the clisp's repl, and then I made an image, run it and BOOM, all hell broke loose. Here is a thread where I got some valuable hints, specially from Ken Tilton, whose Celtk package I used. What happened is that clsql uses CFFI, with cffi-uffi-compat to interface to foreign functions defined in external dll's (in the windows case). Two issues ate more of my time than I would like:

  • Each defcfun (i.e., foreign function interface defined) must be evaluated after having used/loaded (only once) the foreign library. Ok, that makes sense. What puzzled me was that, unline Allegro CL, Clisp (and sbcl, while we're at it) requires us to repeat this use/load-foreign-library each time an image is (re)started. This makes it a bit messier, as I have to choose (1) to load clsql at the image execution-time or (2) fetch the needed libraries from the clsql project, and load them in my main (initial) function. I opted from the second approach, for image load-time's sake. The required lines are something like:

    (cffi-uffi-compat:load-foreign-library "clsql_uffi" :force-load t)

    (cffi-uffi-compat:load-foreign-library "odbc32" :force-load t).

  • CLSQL uses a variable to represent a null C pointer. this is ok, except that this pointer is created one time, so after clisp loads the created image, this pointer is invalid, yielding errors every time they are used. Solution? I've simply replaced all *null-ptr* with (make-null-pointer :byte) and *null-handle-ptr* with (make-null-pointer :void). This made the trick, although incurring in a small performancing penalty - I'm not that desperate, yet :).
After this issues, I was able to read and write to my access database from my (now 6.5MB) executable. On to the next issue, the GUI. Ahh, this made me think of lots of variables:

  • Language integration with the graphics (consider making calls for foreign functions vs. using simple abstractions in lisp);
  • Portability, I wouldn't like to build different GUI's for Mac and Windows, for instance;
  • Prettiness. Ok, It may be the best toolkit ever, but I may like to have, say, the xp look and feel. Or some other...
  • GUI design. There are some nice GUI designers out there, and there are some nice lisp abstrations to make the same result. What to choose?



Given this pre-requirements, my options for free were wxCL, graphic-forms, celtk, Ltk, and cells-gtk. (This list may not be comprehensive, feel free to comment, tell me about the amazing />

I tried wxCL first, as I'd already used wxPython, and felt at ease with wxGlade. So I started using it, created my first hello-world widget, and bang, had a whole lot of troubles, cffi related and others. I guess that they could be handled, but I would still be attached to the wxWidgets theme, not very handsome to me. For the same reason I ditched cells-gtk... I moved along, on to Graphic-forms. This is still in alpha stage, but is looking nice! Although, the interface is only for windows forms, and that makes it pretty useless in a mac/linux box. On I went, to the tcl/tk world. At first I thought I'd be attached to a specific theme/motif, but I've learned about Tile, and this made the trick for me. So, how do I choose between Ltk and celtk? Well, Ken Tilton's Cells project, basically, prodded me towards celtk :) It's just a programming language decision, cells can be used to bind behaviors to certain parts of the GUI in an oh-so-elegant way, I was hooked soon enough!



Ok, so what's the downside, you may ask? Celtk is still a work in progress, and is kind of not-ready for quick use/deployment. I also had cffi-related problems, simillar to those I had with clsql, and I worked around them with the following addition to may main function:

(cffi::use-foreign-library "/dev/Tcl/bin/tcl85.dll")

(cffi::use-foreign-library "/dev/Tcl/bin/tk85.dll")

(cffi::use-foreign-library "/dev/Tcl/lib/tile0.7.8/tile078.dll")



There's also lots of absolute paths to fix, in a decentralized way, the asdf is not up-to-date (Ken relies on ACL's *.lpr files), there's no readme/howto/start-here, and basically no homepage with this information. I think Ken is trying to get the project a home in http://common-lisp.net/, and meanwhile the comp.lang.lisp newsgroup will do to discuss issues we may have.



What I can say is that I've put his demo project, along with a sample MSAccess data access, in an ~8MB executable. I'm sure I can optimize this, but for me it's fair enough, and it loads fast enough for me!



Hope some of you may learn something from this. Feel free to shoot any questions, or suggestions you may have!



powered by performancing firefox