Nevermind that doesn't work because we have all sinned greatly

My previous post is wrong. If you try to authenticate one scope from another, here's what happens:

You have scope A that hasn't been authenticated, it can authenticate scope B. You're looking for B. Here's what happens:

Scope B from A strategy is provisionally assigned as winning scope.

Then you do the the A scope: scope A is provisionally assigned as winning scope. It's successful.

You can then do try to do scope B. You do, it works. You return your successful strategy's result, which is still A, because that was the last to succeed. You have the wrong scope.

I have a PR (https://github.com/hassox/warden/pull/144/files) which hasn't gone anywhere because I can't figure out to write tests for it and it's a crazy edge case. Existence is suffering.

How to authenticate one scope from another with Devise

My app has multiple kinds of users, some of which belong to others. It's a result of having many disparate login systems, one per community, and offering the ability to merge them together. This is how I auth the One True account from the local account.

I banged my head on this for a while: what I wanted to do was to just take the

user.one_true_master_account

model and go from there, but the current_#{scope} helpers aren't available in a devise strategy. So I did some digging and figured out how to do it with warden without dragging in more than I had to. (this is in config/devise.rb)

FromUserAuthentication
  class FromUserStrategy < Devise::Strategies::Authenticatable
    def valid?
      # I'd have to authenticate to figure out if it's valid anyway, but I'd
      # rather run it in the devise auth chain 
      true
    end

    def mapping
      Devise.mappings[:one_true_master_account]
    end

    def authenticate!
      # env here is provided in the superclass, it's that enormous hash that 
      # gets passed to middlewares
      user = env['warden'].authenticate(scope: :one_true_master_account)
      if user
        one_true_master_account = user.one_true_master_account
        success!(one_true_master_account)
      end
    end
  end
end

Then I registered it with the various subsystems that needed to know about it

Warden::Strategies.add :from_user_authentication,
                       FromUserAuthentication::FromUserStrategy
Devise.add_module :from_user_authentication, strategy: true

Then I added it as a default strategy for that scope.

Devise.setup do |config|
  config.warden do |manager|
    manager.default_strategies(:some_other_auth_strategy, :another_auth_strategy,
                               :from_user_authentication,
                               scope: :one_true_master_account)
  end
end

This last one is necessary only because of a quirk in our system. Warden can look at the mapping function to figure out which strategies are appropriate in most other situations, but we wanted to have our strategies in different orders for different models.

The Internet Waffle House Index

To squib Wikipedia:

The Waffle House Index is an informal metric used by the Federal Emergency Management Agency (FEMA) to determine the effect of a storm and the likely scale of assistance required for disaster recovery. The measure is based on the reputation of the Waffle House restaurant chain for staying open during extreme weather and for reopening quickly, albeit sometimes with a limited menu, after very severe weather events such as tornadoes or hurricanes.

The Index has three levels, based on the extent of operations and service at the restaurant following a storm:[3][4]

  • Green: the restaurant is serving a full menu, indicating the restaurant has power and damage is limited.
  • Yellow: the restaurant is serving a limited menu, indicating there may be no power or only power from a generator or food supplies may be low.
  • Red: the restaurant is closed, indicating severe damage.

I think the current DDoS again Dyn DNS counts as a sort of "Internet Storm": a dramatic disturbance in the general Internet atmosphere. I propose a general Internet Waffle House Index:

  • Green: Sites have intermittent failures. Some sites may be receiving targeted attacks and be taken down completely for an extend. Services have outages, it happens.
  • Yellow: Multiple major tech company sites are completely down, with many more having intermittent services. You go to check one service to see if anyone else if having problems with another and it's down as well. Major disruptions to just about every workflow, but some basics still work
  • Red: Google, Facebook, Amazon all completely down for most people. There is no Internet today, come back tomorrow.

If there are any other bedrocks that belong in the red category, I'd love to hear them, but those are the ones that would raise an eyebrow from me.

The Dyn DNS DDoS clearly ranks Yellow (if that link doesn't work, that's the issue we're talking about). I don't know what else qualifies: maybe the 2008 cable cut.

FoxDB stores many things, but HedgehogDB stores one big thing

Let's say you you want to store what you believe to be a ludicrous amount of data. It may or not actually be a ludicrous amount of data. It could really be a lot, or it could just seem that way because of your current setup (I just want to serve 5 TB).

There's really one question — are you storing many things, are are you storing one big thing?

If you are  building a whitebox SAAS platform where one customer's data doesn't interact with another customer's data, you are storing many things. It might be petabytes overall, but as long as each individual customer stays under a certain size, it's not that much of a problem. You're running five thousand different stores — which admittedly has its own problems — but each of those stores is sane. You're doing the equivalent of adding more directories.

The inspiration for this post was  a Microsoft Azure ad about how the cloud helped store police dashcam video. That's storing many things. The files are only so large, and it's a clear, discrete unit. Even if you're going through all of it to get any kind of aggregate information, it's easily batchable.

The web is a big thing. The web links to itself in a non-predictable way — everything is talking to everything else. Any analysis is going to be on a page in relation to all the other pages its related to, and those pages can be anywhere. You're not going to store five levels deep of depth-first link search because that's an insane amount of storage and at some point, you'll need six levels deep. Random seeks are the enemy, but there's no way around it.

The Facebook Graph is a big thing. Everybody knows everybody, or at least has a non-zero chance of knowing them. It used to be many things — the school networks.

Ten years ago increasing storage of unrelated items. Now, it's merely annoying. What's the step to make storing and analyzing huge, complex, interconnected items easier?

 

Math and Shakespease, one at a time please

The answer is the 15th.

If you do a deep, head scratching analysis of Ides of March it's Ides + of + March = Half of March, which will give you fifteen days on either side of March, because March has 31 days.

This quiz was 30 minutes long. I spent 25 on this question, because I remembered 15 and got 16 by hand.

So that's why I got an A- in Sophomore English, admissions council. We studied Julius Caesar and I used my brain.

My unfeasible dream of a data processing platform

I build a lot of charts and dashboards. Sometimes the numbers are wrong. This is the worst thing in the entire world.

Why is it wrong? Well, let's just look through the thirty or so different data sources we have, surely one of those will have an obvious error! No? Let's look at the data sources that populate those data sources! Surely we will have access to all of them, and they will be in a reasonable format, and the bizarre interactions between different ways of string processing and date processing done over a decade or so by different people!

If you're looking at this kind of disaster you've done at least one thing right. You probably have a pretty robust data warehouse platform because you're fucking up at scale. If you don't, everything fell to pieces a long time ago when you had to manage your own database servers and disk handling and...

Back to the disaster.

Imagine you could trace everything. Imagine we have made up tables like this:

SELECT * FROM enormous_table;

id | a      | b  | c    | foreign_id
1  | 122.13 | -1 | 0.32 | 1
SELECT * FROM another_table;

id | a        | e | f    | foreign_id
1  | 944.1311 | 2 | true | 1
INSERT INTO combo_table
SELECT SUM(a) FROM enormous_table INNER JOIN another_table
ON enormous_table.foreign_id = another_table.foreign_id;

And imagine westore all the history and origin on  When the time comes to read from the combo table, we have all the history.

SELECT ORIGIN FROM combo_table WHERE foreign_id = 1;

id | a
1  | 1066.2611
||
||
==== SUM
      || == enormous_table
             1 | 122.13 | -1 | 0.32 | 1
             || == INSERT INTO enormous_table 
                   1 | 122.13 | -1 | 0.32 | 1
      || == another_table
             1 | 944.1311 | 2 | true | 1
             || == INSERT INTO another_table
                   1 | 944.1311 | 2 | true | 1

And then you have that for every. Single. Row. Problem solved. You can look up where everything went wrong.

It's impossible to do, I think. No matter how I go about it, I wind up with a Schlemiel the painter problem - doing one more thing involves doing everything before it, and then the one new one who lived in the house that Jack built. How many steps were involved?

There's a record for each. That record either has to have all the records before that, or a pointer to its parents. Storing all the records gets insane quickly. Pointers mean exploding disk seeks.

It would be great, though.

Reverse engineering Facebook's growing pains

TRIGGER WARNING: Ivy League humblebragging

Facebook Graph API Explorer gave me a little bit of insight into what their process must have been like when they were first getting big. Right now, if you make a new account your id number will be very long and not have much correlation with anything. If your friend made an account at the same time your numbers would be very different.

My id # is around 120,000. If you were at Columbia and got your Facebook account at the same time I did, your account id would be in that range. It's true for my friends. Generally, the older you are the lower you are in this range, the younger the higher.

I worked with someone who went to Cornell, his id number was around 450,000. He had his account for a year longer than I did. His friends had the same cluster, just around that number.

Clearly, at some point Facebook tried to carve up the id space and shard based on that -- surely, no one would have friends from other schools! (It is worth remembering you originally needed a college email, and that the school networks use to be a lot tighter than they are now.)

How am I supposed to do my cargo cult encapsulation in AngularJS?

A computer science degree and years of engineering best practices have given me a very specific anxiety disorder. Whenever I see a public member variable, I feel intense distress. Something like:

public class Coordinate {
    double x;
    double y;
}

makes me physically ill. You know that study where the MIT Professor asked people to do mean things to Furbies?

...in a small experiment conducted for the radio show Radiolab in 2011, Freedom Baird of MIT asked children to hold upside down a Barbie doll, a hamster and a Furby robot for as long as they felt comfortable. While the children held the doll upside down until their arms got tired, they soon stopped torturing the wriggling hamster, and after a little while, the Furby too. They were old enough to know the Furby was a toy, but couldn’t stand the way it was programmed to cry and say “Me scared”.

-- Would you murder a robot?

That's how I feel about making things public that can be made private. Stop doing that to that poor class! Stop it!

Most of my work now is in AngularJS. If you're not familiar, the 101 of it is that if you want to display something, or have a form, you use a special syntax in a template and all the changes will be magically wired together and appear.

<div ng-controller="MadeUpCtrl">
 <form>
   <input type="text" ng-model="MadeUpCtrl.text">
 </form>
 <div>
   {{MadeUpCtrl.text}}
 </div>
</div>

When you type things in the form, they display in the div below without any real effort. You do have to have a variable named text in the controller, even if it has no value. (You can do {{text}} but there a lot of reasons not that I find too boring to explain now).

But wait, you conveniently ask, isn't that mucking around with some other object's internal variables? Yes, yes it is. "text" is a public member variable and we are doing terrible, terrible things to it.

The problem is that there's no real way around this. The usual getText() setText() monstrosity won't work -- it has to be a value, not a function that returns a value. How does Angular know getText() will return something different? Well, you'd have to tell it.

That creates a disaster like this. Template:

<div ng-controller="MadeUpCtrl2">
 <form>
   <input type="text" ng-model="MadeUpCtrl2.exportText">
 </form>
 <div>
   {{MadeUpCtrl2.exportText}}
 </div>
</div>

Controller:

MadeUpCtrl2 = function($scope) {
  this.text_ = "";
  this.exportText = "";
  var self = this;
  // this fires when exportText changes
  $scope.$watch(this.exportText, function(exportText) {
    self.setText(exportText);
  });
}
 
MadeUpCtrl2.prototype.setText(text) {
  this.text_ = text;
}
 
MadeUpCtrl.prototype.getText() {
  return this.text_;
}

The problem is that this is awful. If you want to do anything interesting in setText, let's say escape some html, you're going to have to push it back to exportText to make it visible, and then that's going to trigger another $watch...

Looks like I just have to live with it.

Three years to build a ship, three hundred to build a tradition

 

I

As a shut-in who hates sunlight, I am a big user of Amazon Prime. Most things I order from Amazon are things I should not order from Amazon. They are things I should get for myself, but do not, because I really am too goddam lazy to just walk five goddam minutes to the store.

Unfortunately for my laziness, Amazon has started its own shipping service. It is not very good at delivering packages to my door. It is, in fact, quite bad. Google "AMZL_US", it's a world darker than "Untied Airlines."

So far I am 0 for 2 on packages delivered via them actually showing up — Joseph K. logged into his Amazon account to see that two nothings had been delivered to his doorstep. Phone service was very helpful, as soon as they realized the orders were sent via Amazon's own shipping service they offered to send replacement orders for free. "No need to bother asking if he's looked for it," the nice woman in the North Carolina phone bank thinks to herself. "We know there's nothing there. We've never successfully delivered a package."

II

I live in an apartment complex. My address looks something like this:

XXX Streetname
Apt YYY

If you live in an apartment complex, I would like you to conduct a little experiment. Go to the unit where XXX and YYY are the same, something like

123 Streetname
Apt 123

If your complex is anything like mine, they will have posted signs everywhere. The first will seem reasonable — "Please check Apt number on package" — and then you'll see another, more aggressive, it's strokes rushed — "Do not deliver unless Apt # is 123, this is only for Apt 123" — and then another, gashed into the its canvas in a frenzied hand, its ink a dark red that makes you wonder where it came from — "ONLY APT 123 DELIVERIES YOU HAVE BEEN WARNED TURN BACK"

If you're feeling particularly adventurous, tape up a box, put on your best UPS Browns, and knock on their door. Listen to the tears of the children, "Daddy! Make it stop!" Look at the mother comforting the children, "Daddy will make the bad men stop!" Look into the mad eyes of the man holding the double barreled shotgun. Do what the man tells you. Read off the Apt number on the box, nice and slow.

I probably should have mentioned this earlier — write "Apt 123" on the box.

III

You know how your friends are stupid morons who take forever to find your apartment even though you gave them the address and directions? You know how you spend fifteen minutes trying to get to a restaurant, because the streets don't make any sense here, dammit, but you know you're real close?

My parents house is on a corner. It has two driveways that open to two different streets. One is fairly simple. If I am in the car, I will insist you use that one. If you come from the main street without me, you will see the other approach, a nice big driveway to turn into, made for cars. "Hey, I'm driving a car!," you think to yourself. "Let's turn into that driveway," you say in your best teen-in-a-horror-move-voice, "what's the worst that could happen?"

It is a trap. Do not go in there. It is a twenty foot curved stretch of asphalt with with steep ten foot drops to either side. Better drivers than you have tried to back out and had a to call a tow truck to pull them out of the ditch. Most had to call another, more expensive tow truck that can hall semis, because the regular tow truck is unable to help a car that falls down there. That's how fucked up this driveway is.

Listen to me. I was born on this driveway, molded by it. I didn't back up a straight driveway on flat ground until until I was twenty two. I am the baddest fucking sherpa on this shitty little Everest. Hand me your keys.

IV

Your mail always winds up where it's supposed to be. The USPS mail carrier on your route figured out all the little stuff the first few weeks on the job. The family at Apt 123 caught them as they were delivering one day, had a short conversation, and now they get only their mail and wave every time it's delivered. Nice bunch of people.

Every new person and new organization has to start from scratch. The Amazon couriers haven't done this before. If their annual turnover is only in the high double digits I would be surprised. Guess what? Every new guy has to figure that out, and they mess up a lot.

I think this is why the older people I know are so wistful for the good old days where the person at the store really knew who you were and really knew what they were doing and all that Leave it to Beaver stuff. That experience is unfathomable to me. Of course the clerk has no idea what they're doing. Of course you have to Google around and do everything yourself. Of course sometimes you get the wrong thing and no one will help you and you will be out of luck. That's how it works.

V

There are two basic facts at play here.

  1. The employer wants employees to be as fungible as possible to keep costs down.
  2. Doing the job well requires extensive domain knowledge only applicable to this one situation. There's no way to scale it.

They don't really play well together. Some ways to resolve this:

Compile all collective wisdom in a Great Wiki

Imagine if you wrote down everything you did at your job as you did it. Almost all of the ways things actually get done aren't written down, you just sort of found them out from somebody else who learned from someone else. Well, now they're written down. Now they can look it up in the Great Wiki.

To equal what someone would know after years on the job delivering packages in a neighborhood, the Great Wiki would have an entry on every house in the region. The first few days on the job would consist of the new hire studying all these facts. Some of them will be out of date when they get there, which will cause some mixups. Mis-delivered packages aren't great, but there are much worse disasters an organization can creation with this level of effort. Imagine:

"Yes, ma'am, we need your signature on every package. Why? Uh, well you know, so they don't get stolen," the delivery person says. "The system says code 131, 'probable family member with an addiction.'"

"Ellen's been clean three years. How do you know — no. What!? No."

"Let me just make a note of that in the system —"

"Why are you writing this down!?"

"You don't want me to make a note of that?"

"Why do you have this!? Stop writing! Get rid of it!!"

"The record is inaccurate. Do you want me to update it or delete it?"

"Delete it! Delete everything!"

"Okay."

"How do you know all about this!? How do you know all of this!? Where's Julio!?"

"Julio retired, that's what I'm here. I've read all about you, Mrs. Wallace, nice to finally meet you."

Contract out everything to the same local guys

In this situation and UPS and AMZL_US and FedEx all subcontract out to local people who stay around longer and can build up the knowledge to actually do things well. Maybe UPS doesn't like their current subcontractor, they can probably make do with their business from FedEx. If it really fails, and goes out of business, the other subcontractors will probably need to hire people who have been working the area.

I don't know how much UPS and FedEx contract out vs do in house, but I think there's a mix. The issue is that someone always wants to be The Emperor of All Things, and that requires having your own little army rather than borrowing someone else's.

Do a bad job, but cheaply

Pretty much what happens. Hope my packages get delivered the second time.