Category Archives: insights

a perfect hash function?

As I was walking to get my turkey pot pie today that was cooking in the microwave in our break room, I looked at the parking lot below and realized that parking lots are approximately perfect hash functions.

Think about it: cars come in in some semi-random order; spaces are available in semi-random fashion; cars park; and the owner comes back to the same spot to retrieve the item later. Admittedly, it isn’t necessarily replicable every day – but it’s an approximation.

Perhaps a better example would be a professor who tells his students on the first day of class to remember where they are sitting, because that’s their seat for the rest of the semester. The spaces were filled in random fashion once, then always in the same way in the future: if Sarah isn’t in class, her slot is empty – it doesn’t get filled by anyone else because they’re in their slots.

The real trick will be to figure out how to replicate this behavior functionally.

the inanity of ‘special’ lanes

Carpool lanes do not alleviate traffic. They encourage folks to either a) ignore the ‘carpool-only’ signs, or b) get pissed-off at other drivers ignoring the signs.

I’ve been in California for a few days on a working vacation, and the carpool-only lanes are stupid. Because I’ve been driving by myself to work, I do not have 2 or more people in my car, and therefore am not supposed to be in said lanes, unless it’s *not* between certain hours, which are, of course, exactly when I’m on the road.

There is a similar phenomenon in Virginia where they have dedicated HOV (high-occupancy vehicle) lanes open northbound for part of the day, southbound for another part, and closed the rest of the time. To use those, you must have at least 3 people in your vehicle (unless you’re in a motorcycle or vehicle that can only hold two people, in which case the 1 or 2 (respectively) is OK. So, for those of us who don’t typically travel with more than ourselves or maybe one other person, those spare lanes are useless.

That’s right: even when traffic gets slowed down, those lanes are [mostly] barely used. So, instead of actually alleviating traffic, they end up making the drivers stuck ‘where they belong’ pissed-off at those lucky jerks who can use those spare lanes.

If Virginia were smart, they’d open up those spare lanes to everybody, with the caveat being that there are fewer exits from those extra lanes, so if you are a ‘local’ driver, you should stay out, but if you’re a ‘through’ driver, go ahead and use them.

And out here between San Francisco and Sunnyvale on the 101: drop the signs. Having one lane utilized at <15% while the others are stopped or barely moving is stupid.

All that extra lane has done is make traffic worse.

i know why search is broken

Search is broken. Google, Yahoo, Ask, Alta-Vista, and on, and on the list goes.

Hundreds of companies, thousands of individuals. I know why search is broken, and I know what needs to be fixed. Now to figure out the how of fixing.

When you’re looking for information, you search on keywords. Google’s been nice enough to rank results by ‘popularity’ (yeah, it’s called PageRank, and it’s proprietary, but it’s a popularity/relevance ranking). The problem is that you have to know what keywords were used. Some places are nice enough to suggest spelling fixes (it’s not ‘brittany spears’, it’s ‘britney spears’).

But that’s not the issue. The issue is that you don’t know what word, term, or phrase to look for. You have the concept you need to find, like ‘module’. Except you don’t think of that word, you think of ‘chunk’. Bam! You’re out of luck: no author would use the word ‘chunk’ when they mean ‘module’, right?

To fix search, we need to search on not just the keyword, but the concept. In English, you’d use a thesaurus.

So, you’re thinking: “This is easy! I’ll just build a comparator that looks at the keyword and then goes through an index of a thesaurus and finds stuff. And we’ll all be rich!”

Hold it, buster. You missed something. This is a perfectly valid English sentence, and you can figure out what I’m saying, too: “Bring me the cooler cooler cooler from the cooler’s cooler.” Cooler is used five times, with the following meanings (at least): hip, less warm, box to keep things cool, jail cell, big refrigerator.

That’s the problem with trying to fix search. Words can mean far too many things in English. But here’s your big chance to figure out a solution: I’ve told you the problem, and I’ve given you the target.

Now go make it work.

is plagiarism really so bad?

There has been a lot of talk recently about the huge issue of plagiarism among students. Ars Technica had an article about it on 20 October []. I have also heard the issue discussed on radio talk shows, and been lectured on the consequences of being caught plagiarizing by almost every professor I’ve ever had.

The problem of plagiarism, though, is not new – it’s just easy now. With millions of articles, essays, and papers on thousands of topics just available for the snagging online, it’s not really a surprise that more and more students are engaging in this form of cheating. It’s also not a surprise that teachers are catching these acts of defiance more and more readily. Back in the good ol’ days, when to plagiarize you needed to copy by hand from a printed text without citing it, it was at least a time-consuming process. But no more. Now, it’s as easy as selecting the chunk of the paper you want, and copy-pasting it into your own document. Maybe you’re even nice and do a little bit of paraphrasing so it’s harder to distinguish from your own real writing, but it’s still cheating.

I’m going to wax a little preachy here, but the benefits of plagiarism are only very short-lived. Sure, if you don’t get caught, you get a decent grade. But graduating on lies won’t help you in the real world. Unless you’re planning to do something that requires no honesty, like being a drug dealer, or already have more money than you’ll ever need (there’d be a nice problem), you’re going to get caught. You might make it all the way through school and the early days of your job without anyone noticing, but eventually someone’s gonna realize you can’t do what your grades led them to believe you could.

I had a student once plagiarize my work in a programming class in NY. My professor came up to me after he handed the assignments back and told me what happened – someone copied what I had done and submitted it as their own work. What got them caught was that they forgot to change the ‘written by’ comment I had in the program (none too clever on their part), and my professor gave them a 0 on the assignment. His typical policy was to take the number of identical submissions and divide the grade by the number of identical submissions, and give that grade to each submitter. This gave an incentive to both the cheater and the cheatee (or sometimes the cheaters) to not cheat because all the grades would be affected.

Thankfully, I’ve never had a legitimate temptation to cheat on a test, paper, or project. Most of the time it was because I knew the material better than the other students, so cheating wouldn’t help. Other times it was because there were too few people in the class. But mostly it’s because there’s no substitute for real work.

ask the right question

If you’ve never read Programming Pearls by Jon Bentley, and especially chapter 1, you should. Even before finishing this post. Even if you never write a program or touch a computer.

Now that that’s out of the way, I can continue.

The biggest issue in answering any question is not the answer – it is determining what the asker actually meant when they asked you the question. I, as many other people I know, always start by answering the question I was asked. However, often as not, that was not the question they actually wanted answered. They didn’t know it wasn’t the question they wanted answered, but it wasn’t.

In Bentley’s book, he describes a programmer who needs to sort a list of approximately 10,000,000 items several times per hour on a very limited machine (it was originally written in the early 80s). After spending several minutes helping his programmer friend noodle-out a solution that might take about 2 days to write and about a minute to run each time, he twigs onto what he says he should have asked before answering his friend’s question: “what are you sorting, and why?”. Turns out his friend needed to sort a list of 7-digit numbers, with no duplicates allowed. Why? Well, that was an easy answer, too – he was working with a list of all of the assigned toll-free 800 numbers and needed to be able to ensure that any new ones beings requested and handed out weren’t already taken.

Knowing now what the end goal of the programmer’s question was, Bentley suggests a far simpler method that doesn’t even entail sorting – since the list to be sorted was known to be just 7-digit numbers, he could think about the problem as marking down in a tiny structure whether or not a given number was in use, and if it was, it wasn’t available.

Without going into the exciting computer science applications Bentley brought out (because, of course, you just read the first chapter :)), I want to emphasize how important it is to ask the correct question.

Far more times than you could ever realize, you will be asked a question that wasn’t at all what the asker intended. A common example, “do you know what time it is?” “Well, actually, I do.” I do this to people quite frequently, and not just because it’s fun to mess with their heads, but because I figure the question you ask is the one you want answered. When this turns out not to be correct, a follow-on question is asked that more accurately describes what they’re wanting to know: “would you tell me the time, please?” Ahh, there’s the difference – a question that might actually yield a useful response.

That’s a humorous, and perhaps trite example, but let me give another. A few days ago, a friend of mine taking an operating systems class in graduate school called me up for help with a program he had to write in C. Unfortunately for him, most of his undergraduate programming classes dealt with Java, and C is simply different.

His task was to write a program that would accept a sequence of typed characters, break that list up into the separate elements it contained based on whitespace, and return that list. What he asked me was to help him fix his program to do what I described, but he didn’t tell me what the program was doing, just if I could help him get around the errors he was getting when he tried running it.

Ah hah! After helping him for about 20 minutes try to fix the routine he had written, I finally remembered to ask him what his assignment was. As soon as he told me, I suggested he use a prewritten library call that exists in every C programming environment – strtok. strtok just happens to do exactly what he was describing (if you follow the instructions on how to use it) – it will break-up a string of characters based on some split character, and return the little chunks as ‘tokens’.

Another recent example was that I wanted to get a half gallon, or so, of fiberglass resin. Not fiberglass, and not the hardener that turns it into epoxy, just the resin. In popping out to my local Lowes, I thought about what I would need to ask Customer Service to find out what I wanted to know. Knowing that fiberglass resin is typically sold in conjunction with fiberglass cloth, I decided to ask if they sold fiberglass cloth. It wasn’t what I really wanted to know, but I was pretty sure it would tell me what I actually wanted to know.

That took some effort on my part, like knowing the complementing products to what I wanted, but was worth the effort because it got me what I wanted to know, that yes, in fact, Lowes sells fiberglass resin.

Asking the correct question is always worth your time and effort. Instead of spending 20 minutes debugging my friend’s program, I could have just spent 2 pointing him at the right library. Admittedly, asking the right question is not always easy, and it may only be possible to ask the right question after asking several not-so-right questions. But getting to the actual nugget that you need to know to help someone, or that will give you back the result that you need is worth it. Every time.