Category Archives: Uncategorized

Nevermind that doesn't work because we have all sinned greatly

My previous post is wrong. If you try to authenticate one scope from another, here's what happens:

You have scope A that hasn't been authenticated, it can authenticate scope B. You're looking for B. Here's what happens:

Scope B from A strategy is provisionally assigned as winning scope.

Then you do the the A scope: scope A is provisionally assigned as winning scope. It's successful.

You can then do try to do scope B. You do, it works. You return your successful strategy's result, which is still A, because that was the last to succeed. You have the wrong scope.

I have a PR ( which hasn't gone anywhere because I can't figure out to write tests for it and it's a crazy edge case. Existence is suffering.

FoxDB stores many things, but HedgehogDB stores one big thing

Let's say you you want to store what you believe to be a ludicrous amount of data. It may or not actually be a ludicrous amount of data. It could really be a lot, or it could just seem that way because of your current setup (I just want to serve 5 TB).

There's really one question — are you storing many things, are are you storing one big thing?

If you are  building a whitebox SAAS platform where one customer's data doesn't interact with another customer's data, you are storing many things. It might be petabytes overall, but as long as each individual customer stays under a certain size, it's not that much of a problem. You're running five thousand different stores — which admittedly has its own problems — but each of those stores is sane. You're doing the equivalent of adding more directories.

The inspiration for this post was  a Microsoft Azure ad about how the cloud helped store police dashcam video. That's storing many things. The files are only so large, and it's a clear, discrete unit. Even if you're going through all of it to get any kind of aggregate information, it's easily batchable.

The web is a big thing. The web links to itself in a non-predictable way — everything is talking to everything else. Any analysis is going to be on a page in relation to all the other pages its related to, and those pages can be anywhere. You're not going to store five levels deep of depth-first link search because that's an insane amount of storage and at some point, you'll need six levels deep. Random seeks are the enemy, but there's no way around it.

The Facebook Graph is a big thing. Everybody knows everybody, or at least has a non-zero chance of knowing them. It used to be many things — the school networks.

Ten years ago increasing storage of unrelated items. Now, it's merely annoying. What's the step to make storing and analyzing huge, complex, interconnected items easier?


Math and Shakespease, one at a time please

The answer is the 15th.

If you do a deep, head scratching analysis of Ides of March it's Ides + of + March = Half of March, which will give you fifteen days on either side of March, because March has 31 days.

This quiz was 30 minutes long. I spent 25 on this question, because I remembered 15 and got 16 by hand.

So that's why I got an A- in Sophomore English, admissions council. We studied Julius Caesar and I used my brain.

My unfeasible dream of a data processing platform

I build a lot of charts and dashboards. Sometimes the numbers are wrong. This is the worst thing in the entire world.

Why is it wrong? Well, let's just look through the thirty or so different data sources we have, surely one of those will have an obvious error! No? Let's look at the data sources that populate those data sources! Surely we will have access to all of them, and they will be in a reasonable format, and the bizarre interactions between different ways of string processing and date processing done over a decade or so by different people!

If you're looking at this kind of disaster you've done at least one thing right. You probably have a pretty robust data warehouse platform because you're fucking up at scale. If you don't, everything fell to pieces a long time ago when you had to manage your own database servers and disk handling and...

Back to the disaster.

Imagine you could trace everything. Imagine we have made up tables like this:

SELECT * FROM enormous_table;

id | a      | b  | c    | foreign_id
1  | 122.13 | -1 | 0.32 | 1
SELECT * FROM another_table;

id | a        | e | f    | foreign_id
1  | 944.1311 | 2 | true | 1
INSERT INTO combo_table
SELECT SUM(a) FROM enormous_table INNER JOIN another_table
ON enormous_table.foreign_id = another_table.foreign_id;

And imagine westore all the history and origin on  When the time comes to read from the combo table, we have all the history.

SELECT ORIGIN FROM combo_table WHERE foreign_id = 1;

id | a
1  | 1066.2611
==== SUM
      || == enormous_table
             1 | 122.13 | -1 | 0.32 | 1
             || == INSERT INTO enormous_table 
                   1 | 122.13 | -1 | 0.32 | 1
      || == another_table
             1 | 944.1311 | 2 | true | 1
             || == INSERT INTO another_table
                   1 | 944.1311 | 2 | true | 1

And then you have that for every. Single. Row. Problem solved. You can look up where everything went wrong.

It's impossible to do, I think. No matter how I go about it, I wind up with a Schlemiel the painter problem - doing one more thing involves doing everything before it, and then the one new one who lived in the house that Jack built. How many steps were involved?

There's a record for each. That record either has to have all the records before that, or a pointer to its parents. Storing all the records gets insane quickly. Pointers mean exploding disk seeks.

It would be great, though.

Reverse engineering Facebook's growing pains

TRIGGER WARNING: Ivy League humblebragging

Facebook Graph API Explorer gave me a little bit of insight into what their process must have been like when they were first getting big. Right now, if you make a new account your id number will be very long and not have much correlation with anything. If your friend made an account at the same time your numbers would be very different.

My id # is around 120,000. If you were at Columbia and got your Facebook account at the same time I did, your account id would be in that range. It's true for my friends. Generally, the older you are the lower you are in this range, the younger the higher.

I worked with someone who went to Cornell, his id number was around 450,000. He had his account for a year longer than I did. His friends had the same cluster, just around that number.

Clearly, at some point Facebook tried to carve up the id space and shard based on that -- surely, no one would have friends from other schools! (It is worth remembering you originally needed a college email, and that the school networks use to be a lot tighter than they are now.)


Preach, Eevee, preach

That’s all great, except (a) I’m writing a 2D tile game in Python and really don’t care if my video card is only at 99% efficiency, (b) I don’t control the original code and am not really positioned to learn the entirety of OpenGL so I can port someone else’s entire library to a thing that half their target audience doesn’t support, (c) fuck you.

We have always been at war with UI

How the Sausageable is made

It is a fact universally acknowledged that a program accessing a resource must be in need of an abstraction. In a lot of situations once you add one abstraction you need to add another just to deal with that. What's the point of having a Sausageable if you have to buy and butcher the Butcherable inherits Buyable yourself? Thus the SausageableFactory.

That's a pretty easy problem to spot, since the product and the result are so close to each other. It feels like work to do kill the animal all by yourself. The real problem starts when you get more abstract than reality. Let's take this interface here:

Name ::= <string>
DataObject ::= <boolean> | 
               <number> |
               <string> | 
               <Name> | 
               <Array<DataObject>> |
               <Map<Name,DataObject>> |
               <Array<byte>> |

Let's say most of the work in a program happens on big lists of DataObjects, which could be any of a half dozen different types. This is fine. You can do a lot with this, you can nest them and it makes a good wire format. Then you start to see a lot of functions like this:

  print(DataObject[] obs);
  syncWithAdobeMiddleware(DataObject[] obs);
  infectYourComputerOnLoad(DataObject[] obs);

Yup, this is the PDF format. The api is more abstract than we are; we only care about PDFs. This spec can do a lot more than will ever be done with it. It's a close cousin of the millions of Java interfaces that only have one friend in the whole world, These interfaces aren't decoupled, they're just pretending to be.

Playing the wrong game

I have this weird habit in Civ IV. I play along for about two hours, don't bother to build a military, and then quit the game whenever somebody declares war on me.

There's no way it's not coming. It always happens. My Civilization builds some sweet cities that I fail to guard, and then someone else comes along and declares war. I know it's hopeless, so I quit.

Why do I not learn? And why do I always feel offended when it happens?