05 December 2012

Subversion to git - the pain retold

I've spent a week reminding myself why svn sucks.

I've been using the freetts library for speech synth in the communication book program I've been working on, and have tripped over a bug in freetts running under openJdk. The freetts source code lives in a svn repository on sourceforge. The first step in troubleshooting is to build the library from source. In order to track any local experimentation / fixes I need to have some kind of local source control, and svn sucks too much to provide this. The obvious next step is to pull the sources down with git-svn (or svn2git as github recommends).

After a couple of aborted attempts I was reminded how the loosely defined structure of a svn repository and the over-generalization of tags & branches allows for a complete mess, which then is a pain to import cleanly.

"And they want to make snapshots of smaller subdirectories of the filesystem. After all, it's not so easy to remember that release 1.0 of a piece of software is a particular subdirectory of revision 4822."  --   Argh! Terrible "feature", if you're using this feature then you're doing source control wrong! (Quote taken from http://svnbook.red-bean.com/en/1.7/svn.branchmerge.tags.html )

I could just grab a tarball and start from there, however there is new code upstream since their last release (v1.2.2), and that means testing two branches, and possibly investigating diffs. In addition if I'm going to make the effort to import the history, I ought to do it well enough first time that others can build on it. It's much harder to correct a bad scm import once work is continued, especially in the distributed world of open source.

And so, for my sins, I set about importing the history, and hacking away at it with the excellent tools git provides to turn it into something that actually linked together correctly and didn't make me feel ill by including CVSROOT all over the place (yes, it's not the first migration this project's been through).

On the plus side, it is fantastic that the open source license gives a user of a library such as myself the right to go ahead and do something like this and to share the improvement with the world, regardless of whether it's something the original creators / maintainers would have done.

The layout of the FreeTTS svn repo is not consistent in directory structure, which means the svn import tools don't behave quite as one might expect. This is the inevitable downside to subversions poor choice of architecture around "everything's just a directory structure". (Bitter? Me? Never!)

Here's a taster of how inconsistent the layout is and what a challenge is ahead tidying it up:

tim@atom:~/repo/freetts.svn$ ls */*








acknowledgments.txt build.xml demo.xml index.html license.terms overview.html RELEASE_NOTES speech.properties tests
ANNOUNCE.txt demo docs lib mbrola README.txt rtp src tools





checkoutlist commitinfo config cvsignore cvswrappers editinfo loginfo modules notify rcsinfo syncmail taginfo verifymsg

acknowledgments.txt build.xml demo.xml index.html license.terms overview.html RELEASE_NOTES speech.properties tools
ANNOUNCE.txt demo docs lib mbrola README.txt rtp src unittests

It took all my git-fu powers to sort out this mess. Below is a time shortened sequence of how it was done, just in case I have the misfortune to need to do it again. I ended up abandoning all the ancient tags as they were going to be more effort than I liked to fix, and they could be added retrospectively if anyone really cared. It took me many attempts to get to the below, and this is what I've reconstructed from my fragmented history, hopefully it will provide enough clues should you wish to do similar.

FreeTTS project urls: Project front page http://freetts.sourceforge.net/ , project site http://sourceforge.net/projects/freetts/ , repo browser http://freetts.svn.sourceforge.net/viewvc/freetts/ , svn http access https://freetts.svn.sourceforge.net/svnroot/freetts/
At time of writing the current svn revision is 582.

The latest packaged version for ubuntu:
apt-cache show freettsPackage: freetts
Priority: optional
Section: universe/java
Installed-Size: 13532
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Original-Maintainer: Bdale Garbee <bdale@gag.com>Architecture: all
Version: 1.2.2-3Depends: default-jre | java2-runtime
Filename: pool/universe/f/freetts/freetts_1.2.2-3_all.deb
Size: 9456904
MD5sum: 183bed09b1b8e2d8642f46b7538273f4
SHA1: 8df47df82124704b890f446a1bc958d33fd273d3
SHA256: 8920440eaa58c087cb268e8e2a64d44ac873fb44d49b34f180f587f9c69421a7
Description-en: speech synthesis system
FreeTTS is a speech synthesis system written entirely in the Java(TM)
programming language. It is based upon Flite, a small run-time speech
synthesis engine developed at Carnegie Mellon University. Flite in turn
is derived from the Festival Speech Synthesis System from the University
of Edinburgh and the FestVox project from Carnegie Mellon University.
Homepage: http://freetts.sourceforge.netDescription-md5: a346fe6dcc2c0164ec6b7c3891945e56
Bugs: https://bugs.launchpad.net/ubuntu/+filebugOrigin: Ubuntu

So here's the import more or less as it happened:

mkdir freetts.svn.git; cd freetts.svn.git
svn2git --verbose https://freetts.svn.sourceforge.net/svnroot/freetts/
git gc

cat .git/config
 repositoryformatversion = 0
 filemode = true
 bare = false
 logallrefupdates = true
[svn-remote "svn"]
 noMetadata = 1
 url = https://freetts.svn.sourceforge.net/svnroot/freetts fetch = trunk:refs/remotes/svn/trunk
 branches = branches/*:refs/remotes/svn/*
 tags = tags/*:refs/remotes/svn/tags/*
[branch "release"]
 remote = .
 merge = refs/remotes/svn/release

# get a copy without the svn references (which stop us seeing whether the rewritten history is free of old cruft)
cd ..
git clone freetts.svn.git/ freetts.git
cd freetts.git/
gitk --all &
# The following is done while keeping an eye on and refreshing (ctrl+f5) gitk to see the effects:
# Filter out the cvs rubbish so that git can match up commits that do have it with commits that don't
git filter-branch --tree-filter 'rm -rf CVSROOT' --prune-empty -- --all
# Remove the unnecessary top level folder (which inconsistently existed)
git filter-branch --prune-empty --subdirectory-filter FreeTTS/ -- --all
# Remove the backup refs filter-branch creates
rm -rf .git/refs/original/

# delete all the crappy svn "tags", just tag the latest
git tag -d `git tag`
Deleted tag 'freetts' (was 8d953b7)
Deleted tag 'pre-rel1-1' (was d1c597f)
Deleted tag 'rel1_1_0' (was 625abdd)
Deleted tag 'rel1_1_2' (was b51fb71)
Deleted tag 'rel1_2_0' (was 7a4fc18)
Deleted tag 'rel1_2_1' (was a126a4a)
Deleted tag 'rel1_2_2' (was b3a0dcf)
Deleted tag 'rel1_2_2@557' (was bf94dbe)
Deleted tag 'rel1_2beta2' (was c0d90e9)
Deleted tag 'rel_1_0_5' (was e95aff8)
Deleted tag 'rel_1_2_beta' (was 1723b2d)
Deleted tag 'start' (was c020efe)
Deleted tag 'sun' (was cfadbc8)

# correct commit found manually:
git tag v1.2.2 ae49425

and finally, push to github

git remote add origin .... (my repo details)
git push --mirror

You can find my repo at https://github.com/timabell/FreeTTS
and the intermediate copy here: https://github.com/timabell/FreeTTS-svn-mirror

All done


Here's the reason I didn't bother with tags in the end: I couldn't rewrite the tags as they had no author:

git filter-branch --tree-filter 'rm -rf CVSROOT' --prune-empty --tag-name-filter cat -- --tags
Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f
tim@atom:~/repo/freetts.git$ rm -rf .git/refs/original/
tim@atom:~/repo/freetts.git$ git filter-branch --tree-filter 'rm -rf CVSROOT' --prune-empty --tag-name-filter cat -- --tags
Rewrite 8611e271692fc33e6160a2a217b9b3060dfbcd1d (1044/1044)
Ref 'refs/tags/freetts' was rewritten
WARNING: Ref 'refs/tags/pre-rel1-1' is unchanged
WARNING: Ref 'refs/tags/rel1_1_0' is unchanged
Ref 'refs/tags/rel1_1_2' was rewritten
Ref 'refs/tags/rel1_2_0' was rewritten
Ref 'refs/tags/rel1_2_1' was rewritten
Ref 'refs/tags/rel1_2_2' was rewritten
Ref 'refs/tags/rel1_2_2@557' was rewritten
Ref 'refs/tags/rel1_2beta2' was rewritten
Ref 'refs/tags/rel_1_0_5' was rewritten
Ref 'refs/tags/rel_1_2_beta' was rewritten
Ref 'refs/tags/start' was rewritten
Ref 'refs/tags/sun' was rewritten
freetts -> freetts (b3a4bbf8768ade6275c91ce0e76d933e30b3ddbf -> 48e84e3560e765db3b33479e2e9a76fe2ccf3550)
error: char79: malformed tagger field
fatal: invalid tag signature file
Could not create new tag object for freetts

git show rel_1_2_beta | head
tag rel_1_2_beta
Tagger: (no author) <(no author)@4963320b-1a4a-0410-81c8-f0a525965860>Date: Mon Dec 22 14:46:05 2003 +0000

This commit was manufactured by cvs2svn to create tag '\''rel_1_2_beta'\''.

commit 57ed00e981585aad590c9521d7c3a0bccf6284fa
Author: (no author) <(no author)@4963320b-1a4a-0410-81c8-f0a525965860>
Date: Mon Dec 22 14:46:05 2003 +0000


My advice if you are importing svn for a commercial project: Don't! Just export, and import into your new source control tool. Leave the svn repo read only for a while just in case you need that history, and after a year of never looking back, archive it off.

10 July 2012

AA Gold member benefits, the real cost

Breakdown cover maths

So, I'm getting rather fed up with the AA taking the michael every year with their renewals. Yes, I gather the RAC are just as bad but I think they need to realise their customers aren't stupid, know exactly what they are playing at, and can do the maths.

I want to highlight what I think is a particularly dirty trick, making something look like a free perk when it's anything but.

So here's some numbers:

This is for a single car policy covering Roadside, Home Start and Relay starting July 2012 paying annually for a year up front. (The monthly option is 10% more expensive, go figure). Numbers rounded to pounds.

  • Renewal through the post: £135
  • Matching RAC cover (checked online & by phone): £101
  • AA online price for new customers: £92  - (so much for 6 years loyalty, a £43 kick in the teeth)
  • AA phone price: £116
  • AA phone price without gold membership "benefits": £89

That means, the AA are pricing their gold benefits at £27 even though they look like they are free on the renewal letter! Some cheek.

I queried the details of this so called benefit and established the following:
  • "Accident Management" - means being towed by the AA after an accident (something you may be covered for under your car insurance policy)
  • European Breakdown Cover - only useful if you are going abroad (obviously), did you really want to be paying for it?
  • "Family Associates Cover for under 17s" - something about teenagers, I don't have any so not very useful to me
  • Key Insurance - this could be valuable, but £27/year sounds like very expensive insurance to me even though they are expensive items to replace.
  • Legal Advice - Included as standard! So not a gold benefit at all. Weasels.
  • Technical Advice - Included as standard! See above. Still weasels.

So it turns out that the supposed discount of £44.90 on the posted renewal was actually a £46 insult to my intelligence.

I'm no money saving expert, but that's outrageous.

Configuration confusion in visual studio

Here's a gotcha that got me.

It is not immediately obvious, but visual studio stores in it's sln file a set of project configuration selections for every combination of solution configuration and solution platform.

The gotcha is that by default Visual Studio (all versions 2008-2012 as far as I know) only show one half of that combination in the standard toolbar, so you can get in a situation where one of your developers is building something completely different to everyone else as somehow the platform has silently been changed.

I recommend you add Platform to your toolbar so that you can always see what you are about to build.

And if possible, remove any unused platform configurations from your solution entirely.

14 June 2012

automatic mysql backups

On a debian server with mediawiki installed and running with a local mysql.

root@myserver:~# apt-get install automysqlbackup

root@myserver:~# crontab -e

# m  h  dom mon dow   command
  5  4  *   *   *     automysqlbackup

root@myserver:~# automysqlbackup
root@myserver:~# cd /var/lib/automysqlbackup/
root@myserver:/var/lib/automysqlbackup# find .

Result! No longer need to write a custom cron script each time.

Project homepage: http://sourceforge.net/projects/automysqlbackup/

11 June 2012

Connecting to smb shares on a domain in gnome

The domain name has to be UPPERCASE otherwise authentication fails.

Majorly confusing.

Time lost: 3 hours.


22 May 2012

Debugging stored procedures in VS2010 / SQL Express

Debugging stored procs in a local SQL Express install with Visual Studio 2010.

In Visual Studio, Server Explorer, Connect to your server as localhost instead of .\SQLEXPRESS so that you connect through TCP/IP and not shared memory (which doesn't allow debugging for some reason)

Find the project in your solution which actually executes the stored procedure, right-click > properties > debug > "Enable SQL Server debugging"

Run your project

You may need to hit "stop" and re-attached (debug > attach to process) explicitly selecting "T-SQL code" in the "attach to" box (and optionally managed as well). It *should* automatically select T-SQL but it seems to be hit and miss.

Set a breakpoint in your stored procedure:
  • Server explorer, 
  • the connection you added,
  • stored procs,
  • right-click the proc name > open
  • set a break point in the text of the stored proc
    • if it is not a solid red dot then something went wrong
Run the part of your program / website that will cause the proc to be called.

If the breakpoint isn't hit check the type's in the attach to process list include T-SQL (doesn't seem to always work).

I only got the damn thing to work once. If it doesn't work you get no reason at all which is just crap. The main problem I have is that the attach just quietly drops T-SQL even if you explicitly request it. Shoddy coding from Microsoft in my opinion.

The next best thing is to right-click the stored proc, click "step into" and input the values manually. (Which also requires a tcp/ip connection to the local sql express and is fussy).

Another message encountered a couple of days later without changing anything at all when attaching to the already running web dev process: "User Could Not Execute Stored Procedure sp_enable_sql_debug"

Enabling TCP/IP in SQL Express 2008 R2

Programs > .. R2 > SQL Server Configuration Manager
Network config > Protocols > tcp/ip > enable & properties

Clear the dynamic port under IPAll
Set the TCP Port to 1433 (which is the standard sql server port).

You can then connect to "localhost" (with no instance specified) in management studio.


18 April 2012

Running IE Application Compatibility VPC under Virtual Box

This post is no longer necessary as microsoft now provide official virtualbox images, yay!

Microsoft provide Virtual PC images for testing your website with IE. You can download them from http://www.microsoft.com/download/en/details.aspx?id=11575

Note that the XP image has no expired so is no use, it will reboot immediately after login.

I wanted to make use of the Win 7 / IE 9 image, however Virtual PC is unavailable on Linux. Fortunately VirtualBox can mount Virtual PC's disk images.

For me the image would get half way through booting windows, and then blue-screen (BSOD). I discovered that it was possible to get past this by removing the SATA controller the machine's settings, and instead adding the disk under the IDE controller. After that the machine booted successfully.

Capturing the BSOD, basically press F8 after a lot rebooting, and select "disable automatic restart on system failure" (ref: http://www.webtlk.com/2009/07/02/how-to-stop-windows-7-reboot-loop/)

And here's the bluescreen:

05 April 2012

Poll svn server for changes with git clone

Just for convenience, paste this in a git bash window:

while true; do date; echo "Polling svn server..."; git svn fetch;
echo "Sleeping."; sleep 300; done

Then just refresh your favourite git log viewer.

Get the gist: poll-svn.sh gist

That's all folks!

21 March 2012

Announcing the Communication Book project

I've been working on a piece of open source software to assist people who have aphasia (speech difficulties), and it is now sufficiently functional to be worth mentioning. It's still very rough around the edges, but if you are on a debian based system you should be able to easily get it up and running and see what you think. If you are on other platforms you'll currently need a bit (alright, a lot) of java knowledge to get this up and running.

If you have the time to help I'd be very grateful. You don't have to be a coder, just letting me know if it works for you would be great.

If you want to know more or would like to give it a try then please do head over to the project page at http://launchpad.net/communication.

14 February 2012

quote - scrum progress updates

Quote of the day

Mikko Wilkman said on 05 Jan 09 07:15:
... Even a one hour task might change to an eight hour task (or multiple tasks..) due to new information found out during that one hour that was the original estimate. The key point is not to focus on how many hours the team got done on the task, but how many hours really are remaining. The daily update should never be: "I worked on that for four hours, so you can take four hours out of the estimate", but rather a real estimate on how much work still needs to be done based on current knowledge.
from http://www.scrumalliance.org/articles/39-glossary-of-scrum-terms#1110

25 January 2012

The BBC and the bouncing emails

For the record, this is the staggering response I received from the BBC's iPlayer support when I helpfully let them know that the email address they use as sender when responding to feedback sent through their support web form is non-deliverable.

I think it speaks for itself. You would at least think they would set the sender address to "noreply-werenotlistening-lalala@bbc.co.uk" so you wouldn't waste time composing a response. And if you've been through the web form, you will know that filling it out once is okay, but to use it to reply? Give over!

-------- Original Message --------
Subject: BBC iPlayer - Case number CAS-1258137-Q271CZ
Date: 24 Jan 2012 10:19:09 +0000
From: bbc_iplayer_website@bbc.co.uk <bbc_iplayer_website@bbc.co.uk>
To: tim abell <tim@timwise.co.uk>

Dear Mr Abell

Reference CAS-1258137-Q271CZ

Thank you for contacting the BBC iPlayer support team.

I understand you’re unhappy that the reply email you had sent to our email bounced.

I’m afraid it is not possible to reply to our email because we deal with over a million audience contacts every year and we have to ensure they can be efficiently tracked using our handling system and therefore for every correspondence you need to fill the webform. In addition, our complaints, BBC iPlayer and general enquiries webforms ask for essential information such as channel, programme name and transmission date which means we don't have to write back to people unnecessarily. Using a webform also guarantees we can match a return contact up with the previous contact from that person without the need to cross-check thousands of unformatted emails which would then have to be manually transferred into the tracking system.

We try to restrict public email inbox addresses where possible because we receive millions of 'spam' e-mails and a return email address would attract and generate even more. Junk mail costs the BBC a considerable amount of money because every email has to be checked before we can delete them as it’s not always easy to distinguish them from a genuine email.

I appreciate this may be annoying, but we did not take this decision lightly. Our policy takes into account what is operationally efficient and avoids the need to employ additional staff to process incoming emails. I would therefore ask that you please follow the instructions in the reply you received and use our online form at www.bbc.co.uk/complaints. Your email will then be passed to a member of our team for further investigation and reply.

Once again thank you for contacting BBC iPlayer.

Kind Regards

Usha Devi Peri

BBC Audience Services


NB This is sent from an outgoing account only which is not monitored. You cannot reply to this email address but if necessary please contact us via our webform quoting any case number we provided.


And here is the bounce, so you can see why I thought they had made a mistake:

Subject: Undelivered Mail Returned to Sender
Date: Sat, 21 Jan 2012 07:42:07 -0500 (EST)
From: MAILER-DAEMON (Mail Delivery System)
To: tim@timwise.co.uk

This is the Postfix program at host mxout-07.mxes.net. I'm sorry to have to inform you that your message could not be delivered to one or more recipients. Here is the reason why the message could not be delivered. <bbc_iplayer_website@bbc.co.uk>: host cluster1.eu.messagelabs.com[] said: 550-Invalid recipient <bbc_iplayer_website@bbc.co.uk> 550 (#5.1.1) (in reply to RCPT TO command)

17 January 2012

The trouble with agile is it's a bit too good

Picture of a waterfall in Wales So you've gone Agile! Woo! Well done! You've escaped the last millennium's software practices at last! And boy do you feel in control at last! The iterations are flying past, the story points are getting done at a rate you could only have dreamt of. No longer do you wonder what your development team are up to for months at a time, with that nagging feeling that you are pouring money in and you're not getting best "value".

Seems like some kind of productivity utopia doesn't it?

But there's still something not quite right isn't there? Are the technical team still moaning (pah, that's just technical people isn't it? ... or is it? they often have a point, just usually a difficult one). There's this thing they are always going on about, maybe it changes day to day, maybe it's the same. Maybe it doesn't seem well enough defined to deserve a story. Maybe it just some long term gripe that's never quite as important as all those other items in the backlog that have a priority of "O.M.G. if we don't do this by the 14th of this month we're all DEAD!!$$£##£!!!", so it keeps getting barely scheduled and certainly never done, done, done! (Okay, calm down excitable agile people, saying it once is fine.) But hey that's the process so it must be right, if it doesn't make it, it can't have been that important. Can it?

Well, it's time for some unscientific theorizing.

Atomospheric picture of clouds
I've been doing a bit of job hopping recently (nothing on the scale of contracting, but I've seen a few different things, and a few different approaches to project management). And I've noticed something in the two examples of agile (SCRUM specifically) I've been close to that bothers me.

“When it comes to the detail of the work, the manager is relying on the expertise of their staff.”
As a bit of background, remember that software development is a highly skilled job. Software development is one of those funny worlds of work where the employee inevitably knows more than the manager. The manager will likely have more broad context (I really hope so, for that matter), but when it comes to the detail of the work, and the right thing to do, the manager is relying on the expertise of their staff to make detailed decisions. This is as it should be given that the developers spend all their working hours, often more, immersed in the detail, keeping up with the current technology, and becoming ever more skilled at the job. Even if a manager is initially just as knowledgeable in the field as their staff, just by virtue of spending more time managing than doing (nothing wrong with that of course), they will inevitably become less knowledgeable than their developers over time (whether they admit it or not, and don't we all know someone who still thinks they know everything!)

Okay, I'll get to the point already (this had better be good).

If you have moved from one of the less well controlled project management methods (including the "general panic" approach), then you may or may not have realised that a fundamental shift in power has occurred. The ability to direct the way your development team spends their time has moved more into the hands of the manager, and away from the hands of the individual developers. On the whole I consider this a good thing, as individual developers deciding to do things by fiat doesn't always help a company with its immediate deadlines, and there is a much improved ability under the new regime to pick a goal and get there more or less on time (unless you are just paying lip service to SCRUM), with fewer nasty surprises along the way (such as, the good old "where did that month go?" experience). In the past, management could give the developers direction one month to the next, but day to day was a bit of a mystery, and without SCRUM in place too it was much overhead to cope with. Now, every day is accounted for. Life is good, the company gets more of what it asks its developers for.

Every developer I have worked with has wanted to do a good job for the company they are working for. And they have all been generally competent at both coding, and interacting with management in getting the job done. So if they are good people, and we are so much more "productive" now, then why were they wasting all this time before? Well, to an extent you can explain the improvements by the elimination of some of the tail chasing exercises that happen under less disciplined approaches to project management. But that's not quite enough. There's something else, and it's a question of perceived priorities and the effect they have on what gets done.

When you have your head under the bonnet, you'll notice all the leaks and all the frayed cables. And as anyone who's taken their car to a garage will know, the mechanic can always find something to suck their teeth about and charge you an extra £150. But when you are just driving the car it all seems just peachy till service time. But why do you give in and pay up for that thing you've never heard of? Surely if the car was fine when you were driving, then it must be okay? Well, I don't know about you, but for me it's the fear of ending up as a pile of tomato ketchup on the inside of my windscreen when I finally find out why that thing I can't even name was actually important. So how is this a good analogy? No-one ever died from bad software, right? Well the point is, the mechanic is skilled (like the developer), and I am not (like the manager). Like the manager, I have to decide which things to spend my money on and which things to pretend I know about and leave till the MOT fails.

Software developers, being skilled tradespeople, always have an eye on the long term, and will always be balancing the current panic from the sales department against what is good for the company in the long run. In the past, when you used to lose months at a time, it was often partly because the developers were taking some time to look after the company's long term interests. In hindsight it is easy to justify the long term work that was done with some glib comment, as it's no longer really up for discussion; you can't get the time back after all. But imagine if all these long term things had to be justified before they were done, even if the manager doesn't know what on earth the developers are talking about. Well you know what, a lot of it wouldn't get done, and the developers who really care about your long term future (i.e. those you haven't ground down yet into despondency yet by ignoring them for years on end), would get narked. If it's hard to "put a business case for" then a lot of good developers I know will just not bother, after all they don't have to save management from themselves, that wasn't in the job specification, and people don't like being saved from themselves anyway. Unfortunately, this is exactly the change that moving to SCRUM introduces. Developers can no longer "just do" something that takes more than a day, no matter how much it needs doing in the long run, as it will be blindingly obvious at the daily stand-ups that they are not sticking to tasks, and are going to make an iteration miss its target. After all each iteration is likely already chock full of "must haves", and even if a developer puts the effort in to get a long term piece of work into an iteration, it will always end up lower priority than user stories for customer visible deadlines, and therefore likely still not get done (unless you are getting your velocity right, which of course you should be).

Interlude. You may now hum to yourself for a bit before I attempt to tell you how to fix it.
Atomospheric picture of clouds
So what to do? In a way maybe it's no different to the car analogy. Make sure you (management) get enough long term stuff into the iterations, and give them just as much priority as anything else. Make sure they get done! Just as if they were a short term deadline. Then in the long run, the wheels won't fall off your software, at least not while you're driving. (This blog post contains no warranty, road conditions my vary, any number of factors may cause the wheels to fall off your software. Especially if driven over rough specifications.) Make sure your team of developers know that you are committed to this so that they do actually come forward with the things they know need to be done sooner or later. (If you don't know about it you certainly can't get it fixed.) Perhaps you could create a separate long term plan with the help of the team that provides for long term needs, giving it real deadlines that are as immovable as whatever conference you are showing at next. If you have to justify it to others you can say "because the long term is just as important to our business as the now"! Have the courage of your convictions. Back the long term as well as the short term. Always have an eye on the build up of outstanding long term items (c.f. technical debt). If the long term plan doesn't look like it fits with what you have to deliver day to day, then maybe you need to step back and look at your software architecture as a whole, or the resourcing in your team. I would suggest a practical plan: set a percentage of time that will be spent on the long the term items, say 10%, which is ring fenced for use by the technical experts (the developers), for making sure the long term needs of your software are looked after.

Sooner or later, if things run their new agile course, the chickens will come home to roost, and you'll start to wonder why it's taking longer and longer to get those features out, or more time will be lost to bugs, or things will just start to outright fail. So I urge you to think about the long term and not forget that the manager is not the expert in the detail - that's what your developers are there for. So listen to your techies and the advice they have on the balance of priorities, and take that into account when creating and prioritising your backlog. You will have a happier team and happier software as a result.

So in summary:
  • Moving to agile is excellent, but prevents your technical experts from quietly fixing things for you to the same extent.
  • Don't forget the long term in the excitement of getting features done, done, done!

11 January 2012

Git, Windows and Line endings

I have come to the unfortunate conclusion that git is not the perfect tool for teams developing exclusively on Windows. And by that I mean, I cannot recommend it unconditionally as I would like to be able to do.

The main competition I would be considering is Microsoft's TFS.

I have had plenty of experience working with git under windows (as well as on linux), and what follows are the three reasons I can't wholeheartedly recommend git to a pure windows team. There are of course many reasons to avoid the alternatives, but that is outside the scope of what I wanted to say here.

Just for the record, in spite of these flaws, I still think git is the best thing since sliced bread.

File renaming

This is an outright bug that unfortunately the msysgit developers have chosen not to address (as is their prerogative), and I don't have the resources needed to provide a patch of sufficient quality or run my own variant of msysgit.

The simple test is to change the case of a file's name, which fails, however most obvious workaround (rename to another file name in one commit, and back again in another) actually makes the problem worse. This is because the bug also affects checkouts, so when git on another team member's machine attempts to update the working copy directly from its previous state directly to the requested revision (usually the latest), the "checkout" fails half way through leaving the team member flummoxed.

This is a particularly insidious bug for a team. You will generally have some people who are stronger with git (or pick it up quickly), and some who are not interested or struggle with the new system. Unfortunately if your team trips over this bug, *every* team member will have to work out how to get past it, and it is not immediately obvious from the symptoms what the problem might be or how to solve it. It also leaves the victim's source directory in an inconsistent state, so if they try to ignore the problem and carry on they will get into more of a pickle.

Having to notify every member of your team that you have changed the case of a file and point them to a workaround is hardly going to endear them to their new fangled source control "git".

A real world example of why this might happen:

File in your source tree that has been around since before you had any naming conventions: "VATRate.cs" containing a VATRate class. (Value Added Tax). You now enforce a naming convention where Acronyms are in Pascal case, i.e. VatRate. In order to rename the class you must also rename the file, therefore VATRate.cs is renamed to VatRate.cs, triggering the above bug for your entire team whenever they happen to fetch (and worse every time they switch between branches that do / don't have the patch).

Line Endings

As you know from the depths of history, our beloved operating systems have chosen different line ending systems:
  • Mac: CR
  • Windows: CRLF
  • Linux/Unix: LF
Git has an ingenious way of handling this, and gives you three choices for handling cross platform differences (see git config / core.autocrlf):
  1. Leave them the hell alone (false)
  2. Store them in git as LF and convert them on checkin/checkout (auto)
  3. convert them when you checkin a file but not on checkout (input)
Which in theory is fine and dandy, and either of the first two should both be fine for a pure windows team... if it wasn't for the patch tools. It would seem that as soon as you start applying patches and using some of the more advanced tools that come with git, they introduce inconsistent line endings into checked in files. You also have an issue with the configuration being client side, so it is likely one of your team members will get the setting wrong one day and make a mess.

In my experience, neither of the first two settings are painless under windows, leaving you with a constant overhead of meaningless / noisy diffs, and time spent troubleshooting, and running tools to tidy up files that have had their line endings corrupted.

It's not a show-stopper, but it does make it harder to recommend that a team avoid TFS (for example) and use the "better" solution with all its benefits.

Unicode file handling

I may not have my facts completely straight on this one as I'm no expert in this area, so please forgive me and provide any corrections / references you can in the comments.

Visual Studio has a tendency to add a byte order marker to source files. Which as far as I know is fine. Unfortunately git then is inclined to interpret the file as binary and refuse to show diffs.

(I'm a little uncertain on this one, but I have seen the symptoms first hand, and it happens more than is comfortable)

Footnote: Speed

Git is held up as an example of fast source control, and seems faster than anything else I've used, however it's also worth mentioning that rewriting commit histories (rebase), refreshing the status and tab-completion are (last time I checked) all significantly slower on msysgit (windows) than git on linux.