Monthly Archives: April 2012

Thinking in Git

I had the great pleasure of presenting at SEVDNUG with my son this evening.  He was a wonderful asset, and a great star.  When we were all done, I think his was the bigger applause.  I look forward to presenting with him again.

Our topic was Git, the new hotness for version control. 

The Abstract of the talk is:

Have you ever looked at Git because it was trendy, but stumbled away dazed? Git isn’t your father’s source control system, but most of your knowledge of TFS, SVN, or other source control systems transfer over just fine. We’ll take your existing knowledge of your Commit / Update VCS and we’ll layer in the methodologies, tools, and communities that Git uses. What’s the difference between pull and update? Isn’t branching and merging dangerous? Can I get colored icons in Windows or Visual Studio? How do I contribute to a GitHub project? We’ll graph Git’s actions in blocks and compare it to git command results. You’ll come away thinking in Git, ready to easily leverage the additional power.

Quite humbly, I think we rocked the house.

What is Git?

Git’s website has a pretty good (if a bit verbose) description of Git:

“Git is a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Every Git clone is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. Branching and merging are fast and easy to do.”

Git’s Methodology

Git really has two main tricks, and the focus of all our discussion is to explore in depth these things:

1. Git has 4 repository locations: the working directory, the staging area, the local repository, and remote repositories.  Once you can visualize where your data is, Git is easy.

2. Git’s magic is just moving labels between nodes.  We may call them branches, tags, HEAD, etc, but it’s just an identifier for a node to Git.  If you understand how your labels move as you commit, everything else is gravy.

History of Git

Git was created by Linus Torvalds in 2005.  He’s famous for a few other projects … like Linux.  I heard it said once that Linus would take over the world — perhaps not with Linux, perhaps instead with Git.

Where is Git used?

Git is used by most major source control hosting firms, and has become a staple at most social coding avenues:

1. GitHub is the quintessential Git hang-out.  It offers both open-source and private repositories (cost is quite reasonable), and really focuses on the social end of coding.  (It’s hard to describe the atmosphere and wonderment that is GitHub.  You really need to try it to see it.)

2. BitBucket is the mecca for Mecurial open-source projects, and now provides Git hosting.

3. CodePlex is the open-source hangout for Microsofties, and includes hosting for TFS, SVN, and most recently, Git.

4. SourceForge is the old hub of open-source, originally for CVS, now known for SVN, and more recently, they’ve added support for Mercurial and Git.

Want more evidence that Git is ubiquitous?  Go to Microsoft’s uber-mecca for open-source — to their project for MVC, Web API, and Razor at http://aspnetwebstack.codeplex.com/.  Now click on Source Code, and click Connection Instructions.  Ya know what you see?  A Git url.  No “or SVN”.  No “or TFS”.  Nope.  All you get is Git.

Clearly Git is winning in the realm of version control systems.

Elements of Source Control Systems

There are 4 main elements to a source control system:

1. GUI Tools

2. Command-line Tools

3. Node Graph / Commit Log

4. Storage Map

Looking at these 4 elements in an SVN realm, we get something like the following:

1. TortoiseSVN provides an excellent front-end for the pointy-clicky people.

2. svn command line is great for power users and for automating and scripting things.

3. svn log from either the command-line or TortoiseSVN, and you’ve got a great view of SVN’s history.

4. SVN has really two spots where code lives: the SVN Server, and your Working Directory.

Here’s a graph of the 4 spots in SVN:

svn parts

Using those same quadrants, we can enumerate the toolset for Git in Windows:

1. Git Extensions works great as a Git GUI.  TortoiseGit does ok, but it forces us to look at Git as if it was a SVN repository.  The two can install seamlessly on the same box, playing nicely together.

2. msysGit is the Git engine and command-line tools, though posh-git is an interesting PowerShell plugin.  Most every other tool either depends on or installs msysGit.

3. kdiff3 comes with Git Extensions, and is a great way to browse git’s commit history and node graph.

4. Git’s storage system includes four main sections:

1. The working directory — the check-out folder in Windows
2. The staging area — the code you’re getting ready to commit
3. Your local repository — this is where the majority of the meat is
4. Remote repositories — places where you share the code — you need not have these, or you could have lots of connections to remote repositories

The graph we used for git’s parts looks like this:

git parts

Now let’s spend a bit more time with the 4 storage areas in Git:

Working directory we know from SVN, TFS, CVS, and other acronymed alphabets.

We’re familiar with “the server”, but in this case “the server” is on our local machine — the “local repository”.  That’s kinda weird, but kinda cool.  Now I can commit when I’m at a good stopping point, even if I’m not completely done with the task.  Ever get “code complete but untested” but want to save your progress but not break the build?  That’s the local repository.

The staging area is also a little weird.  Why would I have a place between my working, checked-out files and my repository?  Well, it allows you to commit parts of things without needing to commit the other parts of things.  In time, and with great skill and training, this allows you to commit part of the file if the other parts of it don’t need to be committed yet.  It’s a great place to build smaller, more purposeful commits.  Alternatively, you may find you don’t need this, and you’ll use the “just get it in already” or “-a” flag to the commit command.

Remote repositories are the most curious.  We’ve got a distributed version control system where everyone has a copy of every commit.  I can push my changes to you when I’m ready — as a big clump.  Or maybe I’ll push them to the CI server.  Or maybe I’ll push them to a colleague for code review before I commit to “the build”.  Or maybe on a small project, I just keep my source tree alone on my machine and have no remotes.

Everything but the “Remote Repositories” is on my local machine.  I can do everything needed with these while on a plane — without network access.

We now have the 4 parts of a source control system in Git.  It’s not scary once you know how it works.

Git Commands

We did a great job of showing the git commands and mapping their results in a node graph made of felt and string.  It was truly a performance, and to do justice, I’d need video.  We didn’t tape our presentation, but http://blip.tv/open-source-developers-conference/git-for-ages-4-and-up-4460524 is a similar video, and the one I used for the majority of the inspiration of this section of the demo.  Me and my son took a bit different route, but this video does very well.  “A non-cyclical graph node isn’t scary when it looks like this.”  He’s holding up a tinker toy, I’m holding up felt and string.  The audience laughs, settles into their comfortable seats, and we’re off to the races.

Here’s a summary of the Git commands and the learning we can get from each one.  Go download msysGit, install it, then open up a command prompt or Git shell and follow along.  If I’m commenting on one, I’ll add “// <–” to highlight something.  Don’t include these as you’re typing the commands.

Start it off:

git init // <– get the repository started; we typically copy an existing repository, but this is simpler for now


Adding files:
Go create a file, add content to it, and substitute your file’s name for “<filename>“, and leave off the < and >.

git add <filename> // <–It stored the file in the repository right here
git status // <– it shows we’re on the “master” branch (trunk in SVN speak), and that the file is in staging
git commit -m “some message” // <– it added the label “HEAD” to it here, “HEAD” means “working directory is here”
git log // <– each commit has a sha1 hash that uniquely identifies it

The sha1 of each commit is a hash of:
1. The content of the objects
2. The hashes of the parent node(s)

Add another file or two, and git log starts looking pretty tasty … and pretty verbose.
git log –oneline –graph –decorate // <– this shows the graph node lines, the first 1/2 dozen characters of the hash, and branch labels — very, very handy.  After about the third or fourth time typing this, you too will make a shell alias or a batch file to do it easier.

The .git directory:

So now that we have some nice content in the local repository, let’s go pillaging through the .git folder.  This .git folder is the actual repository data.  There are quite a few tasty nuggets in here, and understanding how they work is very helpful.  Open a few in your favorite text editor.  But DON’T SAVE THEM!  Don’t brick your git!  :D

– open the HEAD file — it’s the sha1 of the commit that the HEAD label points to.

– open a few files in the refs folder — they’re also text files that contain the sha1 of the position of the label that is their name.

– look through the objects folder, and note that the first 2 digits of the sha1 is the folder, the rest of the sha1 is the file — this is where the actual content is stored, zlib compressed.  Open this in a text editor, and you’ll get some pretty zany zaph dingbats.

Branching:

In Git, by convention, most things are done in branches.  Branches are very, very cheap in git, they’re just labels.  Let’s experiment with branching and merging, and see why they aren’t nearly the pain you’ve experienced wit non-distributed version control systems.

git branch somebranch // <– This creates the branch but doesn’t switch to it
git checkout somebranch // <– This switches to it
git status // <– check that you’re on the branch you thought you were on
now create a file, add it, and commit it
git log –oneline –graph –decorate // <– note that somebranch and HEAD moved, master didn’t

It’s worth re-emphasizing that git’s branches are just labels on nodes.  We didn’t copy all the files when we branched, the repo is no bigger for having the branch.  It’s merely another name for the node besides it’s sha1 hash name.

Merging (fast-forward):

Merging is also no big deal in git.

git checkout master // <– get back to master
git merge somebranch // <– merge somebranch into master
git commit -m “merged somebranch” // <– commit this change
git log –oneline –graph –decorate // <– note that master now moved to match somebranch and HEAD

We just moved the label — very simple.  The paths didn’t diverge, so we needed only push the master label forward.  Gorgeous.

Delete the Branch:

In git, a branch is just a label.  If we don’t need to refer to the commits by label anymore, we can remove the label.  It doesn’t delete code, and we can always get back to the node by it’s sha1 hash.  It merely removes the label from the node.

git branch -d somebranch

git log –oneline –graph –decorate

Branch / Merge (not fast-forward):

git checkout -b branch1 // <– checkout -b creates the branch and switches to it in one shot
create a new file, add it, commit it
git checkout master // <– get back to before this new file was added
git checkout -b branch2 // <– simulate a second developer
create another new file, add it, commit it

We now have an interesting scenario: we have two “ends of the trail” — two “tips”.  So let’s merge them together.

git checkout master // <– get back to home base
git merge branch1 // <– merge in the first branch
git commit -m “merge branch1”

git merge branch2

git commit -m “merge branch2”

git log –oneline –graph –decorate

See the nice diamond shape?  Pretty!  Try that with your silly little linear-only SVN.

Also notice where your tags are.  HEAD and master are at the top of the graph, but branch1 and branch2 are still where they were.  We could merge this new master into them (fast-forward), but we probably don’t need to.  More than likely, once this feature is done and integrated and enough time has passed, we can just delete these branch labels.

Working with Remote Repositories:

Now that we’ve got the hang of adding, committing, branching, and merging, and watched how it affects labels in our local repository, let’s take a wander through how we can share this code with others.

Create a new folder outside the previous folder.  Rather than git init in this new folder, let’s “initialize” this repository by linking to the previous repository.

git clone ../path/to/other/repo

The “path” can be a windows file path, it can be an http url, it could be git://github.com/username/repo.git.  It doesn’t matter where it came from, when we’re done here, we have an exact duplicate of the other repository.  Prove this to yourself by running git log:

git log –oneline –graph –decorate

You’ve got all the history, all the tags, but something is interesting.  You’ve not only got your own HEAD and master labels, you’ve got some new labels: origin/master and origin/HEAD (and any other branches).  These new labels show you where the remote connection is.

git remote // <– this one lists the remote links you’ve configured

git remote -v // <– this one gives you the path to the other repository

Now from here, we need to talk a little logistics.  In TFS or SVN land, we did update and commit.  Those two commands are the commands we use in Git when talking about transitioning from and to our working directory and local repository.  So what do we call it when we’re moving between our local repository and a remote repository?  Pull and push.

git pull origin master // <– get latest from them to my local repository
git push origin master // <– send my latest changes to them

If there’s no major changes when I pull, I just commit to fast-forward my tags to the end of the line, and we’re good to go.  If there were changes on both sides, I may need to resolve a conflict before merging.  Git gives me the standard “theirs”, “mine”, “fix it” — both as diffs in command-line or as pretty diff compare views in GUI tools.  When I’m done, I commit, and it’s now combined together.

By convention, we’ll probably want to designate one copy of the repository as “the master copy”, and all push and pull to that one (unless unusual circumstances allow us to push and pull intermediary content between members of our team).  Though git technically doesn’t need a “server to rule them all”, it’s always nice to have a central place where we can back up the corporate assets and where the CI / Build process can look to for “the latest and greatest”.

Advanced Topics

It wouldn’t be fair to leave you in the dark about these advanced keywords that come in very handy.  Alas, it also wouldn’t be fair for me to confuse you with these details while you’re first learning git.  These tools are immensely handy though, and when you’ve got time, Google each of these terms:

git reset
git rebase
git cherry pick
git stash
git flow

Bringing it Together

We’re now concluding where we began: Git has really two major things that differentiate it from the SVN / TFS methodology we’re used to:

1. Git has 4 repository locations: the working directory, the staging area, the local repository, and zero or more remote repositories.  We saw how we could work with the staging area (git add; git status), and in time we’ll see how this view of the commit-in-embryo can be a great tool for forming cohesive, descriptive commits.

2. Git’s magic is just moving labels between nodes.  We saw throughout the command demos that as we made changes, we just created nodes and moved / added / removed labels from them.  These “branches” weren’t expensive, they were just organizational tools.  If a “branch” node path doesn’t work out, just abandon it, go back to master, and try again.  It’s not a lock-step linear process where everyone inherits my mistakes.  It’s just a system of nodes.

Git can now be a very, very powerful tool.  Given the popularity and pervasiveness of git, I foresee it’s just a matter of time before git is the new svn, and svn becomes the next cvs … or worse, the next SourceSafe.  And if you learn nothing else from this presentation of “Thinking in Git”, learn this: friends don’t let friends use SourceSafe.  :D

Great Git Resources

http://blip.tv/open-source-developers-conference/git-for-ages-4-and-up-4460524
http://think-like-a-git.net/epic.html
http://nfarina.com/post/9868516270

Sql Source Control from SSMS to Working Directory

I use Sql Source Control in an unconventional way: I want to commit both sql scripts and code at once.  This means the CI server builds once with both sides.  This works very well if I can commit from Explorer using Tortoise.  It doesn’t work so well when I commit from inside Visual Studio or from inside Sql Management Studio via Sql Source Control.

Sql Source Control is basically Sql Compare between the database in Sql Server and something else, but built into Sql Management Studio — a very excellent workflow.  What’s really awesome: the “something else” (typically a source control system) could be anything.  They’ve built-in the ability to create my own command-line arguments for any source control system I have, then created the standard built-in configurations for the usual source control systems: TFS, SVN, Mercurial, and Git.  Well, what if my “source control command” was empty?  Now I’m scripting the database objects to the folder, and “committing” them to nothing.  Perfect.

They recently added the ability to specify where the working directory is, so I place it in my sql folder right next to my src directory.  Push “fake-commit” in Sql Source Control, then the real commit can happen with the related code from Tortoise.

It takes a bit of configuration, and I reached out to RedGate for clarity.  Here’s the steps they directed me to, now immortalized, to make “database to working folder” work:

1) When linking a db to source control, chose “More… custom setup.” on the left

2) Click on the “Manage Config Files…”

3) Make a copy of the Template.xml file in that folder and rename it to “WorkingFolder.xml”

4) Edit this xml file so that the name corresponds to the filename (<Name>WorkingFolder</Name>), so it appears correctly in the drop down

5) Save the WorkingFolder.xml file

6) Close and reopen the “Link to Source Control” dialog in SSMS

7) Choose “WorkingFolder” from the drop down and specify a working folder by browsing to where you want to write the .sql files to

8) Choose your model

9) Click link

Design Patterns for data persistence: Unit-of-Work Pattern and Repository Pattern

Design Patterns

Design Patterns are names we give code constructs.  We’re not usually inventing anything new, we’re just giving a name for things we’ve always done.  You’ve probably been doing it this way forever, but you never knew what to call it, it was just “the way it’s done”.  Giving it a name is very powerful as it gives us a way to communicate about larger topics.  Yeah, we can talk about for-loop, if-block, but what about system architecture topics?  In the world of auto repair, they have names for their larger pieces: radiator, engine block, alternator.  How do you convey to someone how your Data Access tier or your Presentation tier works?  This is the magic of Design Patterns — names for these common execution techniques.

Today we’ll discuss two common Design Patterns for working with data storage: the Unit-of-Work Pattern, and the Repository Pattern.

Unit-ofWork Pattern

Microsoft really likes the Unit-of-Work Pattern, so most every sample you see coming out of their shops will include this pattern.  The general theory is you have a reference to a stateful link to your data store — a Data Context — that will queue up your queries and then execute all the steps at once within a transaction.  They’ll either all succeed or they’ll all fail.

For example you’re placing an order in an Order Entry system.  You may insert or update the Customer record, insert an Order header, insert one or more Order detail lines, perhaps update the product’s available count.  You want all of these writes to either succeed together or all fail.  The theory goes that you don’t want to get the first 2 lines of the order saved, but then error trying to write line #3 and fail to update the customer or product.  You want to be confident that all these steps will succeed or fail.

A typical method that uses the Unit of Work Pattern would look like so:

public int SaveOrder(int CustomerId, List<OrderDetail> OrderLines) {



    MyDataContext db = new MyDataContext();

   

    Customer c = db.Customers.First( c => c.ID == CustomerId );



    Order o = new Order {

        OrderDate = DateTime.Now,

        CustomerId = CustomerId,

    };



    c.LastOrderDate = o.OrderDate;



    db.Orders.Insert( o );

    db.OrderDetails.InsertRange( OrderLines );

   

    db.SaveChanges();



    return o.ID;

}

You can always spot the Unit-of-Work pattern with it’s tell-tale “now go do it” method — in this case named “SaveChanges”.

I could refactor this into 2 or 3 methods, one that creates the order, one that updates the customer, and all the while hang onto that stateful Data connection until I finally complete that work unit.  These methods could call different stored procedures to do each step if need be.  But good, bad, or indifferent, I must keep track of my handle to my work package until I’m done, then I must specifically say “done”.

Why use the Unit-of-Work pattern rather than just running each query separately?  Because I’m confident that the entire work unit will either succeed or fail, I’ll never get caught with an inconsistent data state.

Repository Pattern

The Repository Pattern’s focus is to create an abstraction between the data store and the application so the application doesn’t need to think about how it’s stored, only that this widget stores it.  The Repository is responsible for doing all the wackiness of connecting to the data store, opening the db connection, forming the query parameters, containing the open connection, etc.  A Repository class has methods that take in objects or parameters, and return either an object or a list of objects.  A Repository method doesn’t do business logic outside of simple validation.  It just shims data to the data store.

A typical class that uses the Repository Pattern would look like so:

public class CustomerRepository {



    public Customer GetById( int CustomerId ) {

        using ( MyDataContext db = new MyDataContext() ) {

            return (

                from c in db.Customers

                where c.ID == CustomerId

                select c

            ).FirstOrDefault();

        }

    }



    public void Save( Customer c ) {

        using ( MyDataContext db = new MyDataContext() ) {

            if ( c.ID < 1 ) {

                db.Add( c );

            } else {

                db.Update( c );

            }

            db.SaveChanges(); // Sadly, Microsoft’s ORM is Unit-of-Work

            // so we’re just saying “my unit is this method”.

        }

    }



}


Usage of this class is pretty straight forward:

public void ChangeCustomerName( int CustomerId, string FirstName, string LastName ) {

    CustomerRepository r = new CustomerRepository(); // Since the Repository is stateless,
    // this could be a property on this class rather than a new instance

    Customer c = r.GetById( CustomerId );

    c.FirstName = FirstName;

    c.LastName = LastName;

    r.Save( c );

}

Notice that the consuming class doesn’t need to know if GetById() was a stored procedure, or used an XML file or called an external service.  It just asked for something, passing in simple parameters, and got back a result.

Why use the Repository Pattern rather than just create an instance of the DataContext everywhere and just query stuff?  Here’s a few reasons:

1. We can easily evaluate data usage, and optimize queries and indexes, because all queries are very carefully defined in a specific section of the code.

2. Data connections are opened and closed within the method, so no connections leak. (Yes, .net’s connection pooling insures we’re not actually hammering the db.)

3. If we ever need to swap data access strategies (LinqToSql to Entity Framework for example), we likely replace the DataContext class, change the Repository methods to call the new Context’s methods, but nothing else in the app needs to change.  OK, if your Data Access strategy requires your objects derive from something, you’ll need to adjust that too.  But the beauty of things like Entity Framework Code First or NHibernate or Dapper is that your entity classes don’t need to derive from anything funky — they’re just plain old C# objects.

A composite Unit-of-Work and Repository Pattern

Well we’ve painted a nice picture of each of the patterns and their benefits, but what if I want the best of both worlds?  What if I want to know everything in various tables and methods happens at once but I want that clean separation that the Repository gives me?  That turns out to be a pretty simple adjustment, though often it’s overkill for the task at hand.  But most of the “Repository Pattern” samples from Microsoft will actually do it this way.

Here’s an example of the hybrid Unit-of-Work / Repository Pattern:

public class BaseRepository {



    public class UnitOfWorkHandle {

        internal MyDataContext db { get; set; }

    }



    public UnitOfWorkHandle StartUnitOfWork() {

        return new UnitOfWorkHandle {

            db = new MyDataContext();

        };

    }



    public void FinishUnitOfWork() {

        db.SaveChanges();

    }



}



public class CustomerRepository : BaseRepository {



    public Customer GetById( UnitOfWorkHandle Handle, int CustomerId ) {

        return (

            from c in Handle.db.Customers

            where c.ID == CustomerId

            select c

        ).FirstOrDefault();

    }



    public void Save( UnitOfWorkHandle Handle, Customer c ) {

        if ( c.ID < 1 ) {

            Handle.db.Add( c );

        } else {

            Handle.db.Update( c );

        }

    }



}


This is pretty cool:

1. The actual implementation details of my data access strategy is hidden (and thus easily replaceable)

2. All the queries are clearly defined and easily auditable

3. I’m confident that either everything will succeed or everything will fail

It has some draw-backs though:

1. Usage is pretty complex:

public void ChangeCustomerName( int CustomerId, string FirstName, string LastName ) {

    CustomerRepository r = new CustomerRepository();

    UnitOfWorkHandle h = r.StartUnitOfWork();

    Customer c = r.GetById( h, CustomerId );

    c.FirstName = FirstName;

    c.LastName = LastName;

    r.Save( h, c );

    r.FinishUnitOfWork();

}

2. If I forget to call FinishUnitOfWork() absolutely nothing happens: no error, no warning, no data is saved, it just goes poof, and sadly, it’s very easy to do.

Which to Use?

Ultimately the choice of which Data Access Pattern you use (similarly to your decision of which ORM / data transport technology you use) depends on the situation.  Do you want clean separation of concerns between your tiers?  Do you need to know that everything is executed together?  Would you rather a simpler interface to getting data?  Some even argue that Microsoft’s ORM tools are themselves the “Data Layer”, and that any pattern that wraps it is wasteful.  Choose the strategy that gives you the best value for the needs of the project: speed of development, maintainable code, legible code, data access speed, etc.

Rob

Demystifying Lambdas in C# 3.0

I’ve had this conversation a few times, so it seems natural that it’d evolve onto my blog in time.  The conversation typically goes like this: “There’s this weird thing I don’t get.  It looks like this: x => x < 6 and like this: Func<int,bool> p.”  Why are these so magical?  How did they come to be?  That’s what we’ll look at here.

Let’s start by clearly defining Lambda.  It’s a function pointer (for the C/C++ among us), or it’s a short-hand way to define a method (for those of us without that much hair loss), and it grew out of C#’s delegates.  They just make it simpler to pass basic (and not so basic) logic to another procedure.

In .NET 1.0, we had delegates.  MSDN defines them as “a type that references a method” or put more simply “a function pointer”.  Somewhere in my class definition, I’d create the delegate definition:

public delegate bool LessThanSixDelegate( int InputVar );

Elsewhere in the class, I’d create the implementation method definition:

public bool MethodName( int InputVar ) {

    return InputVar < 6;

}

Then embedded in some method somewhere I would instantiate this delegate, referencing the MethodName() function:

LessThanSixDelegate methodPointerInstance = new LessThanSixDelegate( MethodName );

Then I’d use the instance like so:

methodPointerInstance( 12 );

That’s .NET 1.0 code, and it’s well worn, and quite verbose.

Fast-forward a few years, and we’re at .NET 2.0, and the latest hotness is “Anonymous Methods” — basically delegates without all the pomp and circumstance:

public delegate bool LessThanSixDelegate( int InputVar );

LessThanSixDelegate methodPointerInstance = delegate( int InputVar ) { return InputVar < 6; };

We’re making great progress.  I still need the delegate definition, but I don’t need to separate the method that does the work from the instance of the delegate.  I create an inline method — an anonymous method — because the method doesn’t have a name, only the instance has a name.

Fast-forward to .NET 3.0, and we now have Lambdas.  The main goal is to get more terse, avoid redundant characters, and get rid of the delegate definition.  Now we can create the delegate definition and method implementation all at once as we instantiate the instance.

var methodPointerInstance = ( int InputVar ) => { return InputVar < 6; };

They needed a character to denote the difference between input parameters and method body, and to denote it as a quick-built delegate — function pointer — lambda.  The character of choice: “=>“, pronounced “such that”.  My method is basically “take in an int InputVar, and calculate InputVar such that InputVar is less than 6.”

This is gorgeous, but there’s still some redundancy.

If the usage determines the input parameter’s type, we don’t need to specify it.  Let’s simplify by removing it:

var methodPointerInstance = ( InputVar ) => { return InputVar < 6; };

If we have only one line in our method, we don’t need the return or the curly braces.  Let’s simplify it to just this:

var methodPointerInstance = ( InputVar ) => InputVar < 6;

If we only have one input parameter, we can avoid the parens around it and we’re left with this:

var methodPointerInstance = InputVar => InputVar < 6;

If I want to name my input parameter “x” instead of “InputVar”, I could yield this:

var methodPointerInstance = x => x < 6;

Gorgeously tiny, very sweet.

Well, all these simplifications were completely optional.  These two lines are completely identical:

var methodPointerInstance = ( int InputVar ) => { return InputVar < 6; };
var methodPointerInstance = x => x < 6;

I can also choose which of these simplifications I prefer, and leverage that syntax.

Well, what if the assumptions we made while simplifying it don’t hold true?  What if we have zero parameters?  Or we have more than one parameter?  Well, we can’t leave off the parens then.  What if we have more than one line in our function?  Then we can’t leave off the curly braces or return line.  Here’s an example of each:

Zero parameters:

var methodPointerInstance = () => 5 < 6;

Two parameters:

var methodPointerInstance = ( Param1, Param2 ) => Param1 < Param2;

Two line long method body:

var methodPointerInstance = InputVar => {
    int comparer = 6;
    bool result = InputVar < comparer;
    return result;
};

I could also choose to add back in any of the short-cut things that I had removed.  Definitely season to taste.  All else being equal though, I like typing less.

So what of this construct: Func<int,bool>?  That’s just a short-cut way of generically specifying a delegate as a parameter into a method.  Action<T,U,V> is that way too.  Func<> will always return the last parameter and take in all the rest as input parameters.  Action<> will always take in all the parameters and return type is void.  Let’s see each in action:

Func<int,bool> methodPointerInstance = InputVar => InputVar < 6;

This “anonymous method pointer” takes in a single int, and returns a bool.

Func<int,int,bool> methodPointerInstance = (Param1,Param2) => Param1 < Param2;

This func takes in two int parameters and returns a bool.

Action<int,bool> methodPointerInstance = (Param1,Param2) => {

    if ( Param2 ) {

        DoIt(Param1);

    }

};

This action takes in two parameters and returns nothing.

So, now when the method signature intelisense pops up with Func<int,bool> methodSignature, no need to run for the hills.  Just pass in x => x < 6 — a lambda that matches the delegate definition using this cool short-cut syntax.

Lambdas are immensely powerful, and your existing skills in forming delegates to satisfy click event handlers already gives you the skills you need to leverage them.  At the end of the day, they’re just function pointers.  Awesome.

Rob

Configuring other IIS boxes in the web farm

Configuring the first IIS 7.x box is far easier than IIS 6 was, but configuring multiple web servers to behave identically can be a pain.  Web Farm Framework (available in Web Platform Installer) can automatically synchronize things, but these settings change so rarely, and changes need to propagate immediately that I’d rather do this manually.  Well, constructing the 3rd or 4th or nth machine gets old.  Can we make this easier?  Most definitely we can.

Back in the IIS 6 days, the “IIS Metabase” was a scary thing — like modifying the registry or getting surgery.  Granted, we did this every day, but it was always daunting.  In IIS 7, the entirety of the IIS configuration details are in xml in C:WindowsSystem32inetsrvconfigapplicationHost.config (unless they’re overridden in each site’s web.config).  How do you set up machine 2 to behave exactly as machine 1?  You diff the files and copy nodes.  Yeah, no more surgery.  Awesome.

Ok, setting up IIS #2 isn’t quite as simple as diffing the files, but it’s pretty close.  Here’s a rough checklist of things I do to make machine 2 function identically to machine 1:

1. Install IIS on each machine
2. Install any plugins / extensions on each machine – typically this is merely a trip through Web Platform Installer
3. Configure Machine 1 to be perfect
4. Backup C:WindowsSystem32inetsrvconfigapplicationHost.config on both machines — it’s easy to mess up, and running without a safety net is bad
5. Diff C:WindowsSystem32inetsrvconfigapplicationHost.config between the two machines, and begin noticing the differences
6. Copy changes from machine 1 to machine 2
7. Restart IIS or reboot (you probably haven’t rebooted since installing Windows Updates) — probably not essential, but best not to get started on the wrong foot

As we’re diffing applicationHost.config we’ll see a few things that we can merge, and a few things that must stay different.  Let’s look through a few sections:

<configProtectedData> node has AesProvider and IISWASOnlyAesProvider nodes.  These include machine-specific details.  If you accidentally merge these details between the two machines, go to your backup and get the original details back.  I’ve never personally hosed a box by doing so, but I’ve also treaded very carefully here.

<system.applicationHost><applicationPools> node includes one node per app pool.  Do you always set them to 4.0, startMode=”AlwaysRunning” or anything else interesting?  It isn’t 3 or 5 clicks away, it’s just a text file change now.  Be careful not to merge an identity password though — it’s machine-specifically encrypted.  Just merge in all the app pools from Machine 1 into place.

<system.applicationHost><sites> node includes one child <site> node per website.  You can configure everything here just by adding attributes and child nodes.  Or add a complete site by merging in another <site> node.  (Be careful to insure their id=”” are unique and that they reference an applicationPool that exists.)  Just merge in all the sites from Machine 1 into place.

<system.webServer><globalModules> includes a list of all the httpModules installed into IIS.  Depending on what order you clicked the check-boxes while installing IIS or what order Web Platform Installer installed plugins, these may be in different orders between the machine.  Provided you don’t add or remove nodes, you can reorder them for “cleanliness”.

<location path=”…”> nodes at the bottom alter authentication protocols for each site.  You can do similar overloads in web.config, but if you configured it through IIS, they’ll be here.  (Alternatively, if you configured it in Visual Studio, the details may only be in that site’s web.config.)

Do you have any other noteworthy nodes in your IIS applicationHost.config files?  Any other techniques for configuring IIS with ease?

Enjoy!

Rob

The “real-time” web in ASP.NET MVC

The “real-time” web is one of the holy grails of software development.  It’s the notion that we can provide a native experience through a browser.  Users can click or touch buttons, get instant feedback, get app status updates, all without the dreaded postback.

There are quite a few technologies that are starting to bring this to fruition, but like any new shiny tool, we need to use them with wisdom and purpose.  With this new hammer, everything may start to look like a nail.  Let’s discuss the landscape of these tools within the context of ASP.NET MVC.

Typically when we say “real-time”, the go-to marketing answer is SiglalR as it provides for a constant connection between browser and server, and thus allows the browser to receive events from the server as they happen, not necessarily on next timer tick.  (SignalR also has great support for falling back through a stack of technologies that accomplish similar effects: server-side events, long polling, comet, etc.)  This type of connection is perfect for things like chat apps where the events from server to client are randomly timed, and you have a few clients.

However, with a constant connection from client to server comes problems with load balancing and scalability.  It’s easy to overload your server with SignalR.  The premise of the web — and why it generally scales so well — is it generally is very short duration requests that are heavily cached.  For example, consider a server that can handle 1000 concurrent requests.  (Ok, maybe it’s 100,000, maybe it’s 10, the number is not important.)  Now imagine that instead of quickly connecting, getting the answer, and disconnecting, they instead stay connected.  Now imagine the 1001th user tries to log on, and the user gets an HTTP 503 because the server is too busy.  Oops.

If either of the two conditions above don’t apply — either you have evenly spaced events or you have very many clients — then SignalR is just asking for server pain, and you should create a short-poll app with just regular AJAX and jQuery.  For example, if you’re just updating a dashboard page’s graphs, charts, and news feed with the server’s current data for this second, then create a regular $.ajax(…) call every second with a recursive setTimeout() loop or use setInterval() (the latter is easier, the former is safer), and tell the server to cache the answer for 1/2 second by using an [OutputCache] attribute on the Action method or Controller class.  Therefore the clients all get the answer quickly and reliably, the server need only process it twice within the second, and the server that can handle only 1000 concurrent requests can now handle far more concurrent browsers than just 1000.

Why output cache for 1/2 second instead of matching the client’s 1 second?  Because if one client connects at time 0:00 and the next client connects at time 0:00.9, they’ll get a second old result.  If this an acceptable data freshness, definitely set the [OutputCache] to 1 second.  If 5 or 15 seconds is acceptable, set both client and server there.  If 15 minutes is acceptably stale, set it there.  (Note that because we’re discussing this timeout, probably it’s best to store it somewhere configurable and tweak it over time.)  But understand an [OutputCache] of 1 minute means client 2 connecting at 0:59 will have 1 minute old data.

So what do you pass from server to client?  Do you pass markup — a partial view?  A JSON object?  A combination of the two?  How about an UpdatePanel, a PageMethod, and a timer control?  It completely depends on your comfort level with each technology, and the prioritization of speed of client processing vs. speed of network transfer vs. development time.  Almost without fail, an UpdatePanel and a timer will be the heaviest use of the network and browser, but will leverage almost no JavaScript skills.  In all but the rarest scenarios, it’s bringing a nuke to a gun fight.  I prefer very concise JSON objects because it uses so little network bandwidth, which is probably the most limited resource, but it does require some descent JavaScript skills.  Your mileage may vary, and your skills and development timeframe may lead you to different preferences.

Ultimately, the theme now-a-days in ASP.NET is the theme that’s emerging here: there is no one right way, but rather there are many solutions that are all well supported.  The “one ASP.NET” may be better termed “one basket of tools”, each of which is appropriate for a different task and skillset, and inappropriate for others.  The day of the hammer searching for a nail is over.  Long live the hammer.

Rob