I build a lot of charts and dashboards. Sometimes the numbers are wrong. This is the worst thing in the entire world.
Why is it wrong? Well, let's just look through the thirty or so different data sources we have, surely one of those will have an obvious error! No? Let's look at the data sources that populate those data sources! Surely we will have access to all of them, and they will be in a reasonable format, and the bizarre interactions between different ways of string processing and date processing done over a decade or so by different people!
If you're looking at this kind of disaster you've done at least one thing right. You probably have a pretty robust data warehouse platform because you're fucking up at scale. If you don't, everything fell to pieces a long time ago when you had to manage your own database servers and disk handling and...
Back to the disaster.
Imagine you could trace everything. Imagine we have made up tables like this:
SELECT * FROM enormous_table; id | a | b | c | foreign_id 1 | 122.13 | -1 | 0.32 | 1
SELECT * FROM another_table; id | a | e | f | foreign_id 1 | 944.1311 | 2 | true | 1
INSERT INTO combo_table SELECT SUM(a) FROM enormous_table INNER JOIN another_table ON enormous_table.foreign_id = another_table.foreign_id;
And imagine westore all the history and origin on When the time comes to read from the combo table, we have all the history.
SELECT ORIGIN FROM combo_table WHERE foreign_id = 1; id | a 1 | 1066.2611 || || ==== SUM || == enormous_table 1 | 122.13 | -1 | 0.32 | 1 || == INSERT INTO enormous_table 1 | 122.13 | -1 | 0.32 | 1 || == another_table 1 | 944.1311 | 2 | true | 1 || == INSERT INTO another_table 1 | 944.1311 | 2 | true | 1
And then you have that for every. Single. Row. Problem solved. You can look up where everything went wrong.
It's impossible to do, I think. No matter how I go about it, I wind up with a Schlemiel the painter problem - doing one more thing involves doing everything before it, and then the one new one who lived in the house that Jack built. How many steps were involved?
There's a record for each. That record either has to have all the records before that, or a pointer to its parents. Storing all the records gets insane quickly. Pointers mean exploding disk seeks.
It would be great, though.