Thursday, March 19, 2009

Hard Problems

I ran into a hard problem today at work. Here it is:

Given two strings, check that the first does not contain a sequence of 3 consecutive letters that are also contained in the second.

Look easy? It should. I don't know of very many easier problems. After implementing it though, I can't truthfully say that it was easy for me. But, I don't think the lessons I learned today are easy either. As far as I can tell, I made three mistakes:

Lesson #1: Be careful when picking your test cases

It's easy to pick a test case that's either too hard, or too easy to implement. If you pick a test case that's too easy, you and your pair risk getting bored. Oddly enough, this is almost never a problem unless you're actively trying to avoid picking a test case that's too hard.

So, if you're bored, then your tests are too easy... how do you tell when your tests are too hard? Well, that's another hard problem. I don't think I've nailed this down completely yet, but here are some clues I've learned to spot:
  • It takes more than one try to get your newest test to pass
  • When you do get your test to pass, another test fails
  • Your pair doesn't understand what you just did
  • You find yourself wanting to refactor against a red bar
My advice when you run into any of these indicators is to stop. Stop writing code. Revert to a green bar. Now you have two options: Refactor, to make things easier; or pick a different test.

Sound easy? Try it. There's two reason why it's not.

First it's hard to admit when you've picked a test case that's too hard to implement. We're programmers (craftsmen if you will), it's our job to solve hard problems. We take pride in our ability to do so. We want to make that next test case work. Taking a step backward and reverting the code you just wrote (that doesn't work) is an admission that you've made a mistake. Admitting this mistake is especially hard to do when you're working on an "easy" problem.

Second, both of these options require you to THINK. It's tempting to think that if you tweak a conditional here or extract a method there you'll see a green bar soon. This rarely turns out to be the case. Even the easiest problem is going to require you to stop and think about the solution; probably more than you expected to.

If you run into this, Stop. Think about the problem. Discuss it with your pair. Then, take another crack at it.

Lesson #2: Commit Often

Today, my pair and I ran into all of the clues listed above, and failed to stop. It was at this point that we got burned by lesson #2. We realized what was going on, and wanted to revert to our last green bar. We hadn't committed in 90 minutes. We were stuck with broken code. Oops.

I don't have a strong opinion on when you should commit. Ideally, I would say you should commit on every green bar, or after every refactor step; whichever is easiest to remember for you. Of course, some teams are cursed with long running unit tests. Since you don't want to commit without running your tests; this makes it difficult to commit so frequently. Do what you can to keep your tests running quickly, but in the meantime, find some balance between committing often and not letting your tests slow you down.

Lesson #3: Focus on your work -- No Distractions

Today, during the "embarrassing pairing session", I had an interesting email thread zipping through my phone, which dutifully beeped at me every few minutes. Every once in a while I'd try to keep up with it, meanwhile completely losing my focus on the problem and relying on my pair to get me back up to speed when I was done.

Please, be wary of checking your email or other distractions when pairing on a problem. If you're not at the keybaord, your job is (among other things) to help figure out the next test, watch for warning signs like the ones I listed above, and maintain code quality. You probabaly can't (I know I can't) do any of these things while checking your email, or carrying on a casual conversation with a friend. Also, respect your pair. If they're working, you should be too. They're going to resent you if you don't pull your weight.

4 comments:

Tony Arkles said...

Problems like this are the ones where I see TDD breaking down somewhat (with the test case, code to get it working, iterate loop).

This feels like a problem from an Algorithms class, and I'd probably approach it the same way: figure out the entire algorithm on paper first, figure out the big-O order of the algorithm, and iterate there until I had something I was happy with.

After having the completed algorithm worked out as pseudo-code on paper, then I'd type it up, and hit it with a bunch of test cases (probably cases that I wrote down and thought about as I was working it out on paper), making minor adjustments as I went.

It's early, so excuse me if this seems a bit off base :).

It seems to me that this is primarily a "Computer Science" problem, not a "Programming" problem, and as such, might be better served with a different approach?

iosparkletree said...

This problem reminds me of what bioinformatics folks do when matching gene sequences. Markov models come to mind, however I don't know nearly enough about the field to be able to point you towards a good algorithm.

I think that the guidelines you came up with for recognizing when you've got a "Hard Problem" are really good!

Tony Arkles said...

Off the top of my head, I thought of two different algorithms for doing this -- one is slower O(mn) in time, O(1) space (where m is the number of characters in the first string, n is the number in the second); the second is faster O(m + n), but requires more space O( min(m, n) ).

Any thoughts about how this would be captured in a unit test suite and TDD?

(In theory, a set of tests build to test the functionality of this code should pass on either algorithm -- which is desirable, because it means you could swap the implementations without changing the behaviour)

Kevin Baribeau said...

Hey Tony,

I think the answer you're looking for is a bit long and involved. The gist of it though is that testing this is simply about the inputs and outputs. Out of all the possible inputs, figure out where the important ones are (think edge cases), and test those. Also test the most important business cases, and anywhere else that is likely to fail.

We ended up a set of tests that looked something like this:

"", ""
"", "abc"
"abc", ""
"aabc", "abc"
"abc", "aabc"
"abcc", "abc"
"abc", "abcc"

To actually TDD the solution requires a bit more thought. I'll leave that as an exercise for now, but look for some more clues in later blog posts.