Hacker Newsnew | past | comments | ask | show | jobs | submit | more pbardea's commentslogin

I love these types of practical approaches to networking. At least for me, I think it's the clearest way to learn about these things (rather than just read about them). Would have certainly made my university networkings course much more clear!

That's what made Crafting Interpreters[0] so compelling to me. Does anyone know any similar resources for networking?

[0] https://craftinginterpreters.com/


Are there existing tools that help debug compiler time properties (e.g. reference counting) easily? I know of many interactive debugging techniques for investigating runtime errors, but when I'm faced with a compiler error it seems like the best interface I have is just an error message and then meticulously reading the code. I'd find something like that really useful for type-related errors too.

Even something as simple as breakpoints and printlns let me inspect the intermediary state of these systems.


Since reference counting is automatic for the most part, the language tries to express the semantics of references itself - for instance, you can capture self weakly for a callback later, and have to check that you still have a valid capture before use.

After that, your issues then are typically reference cycles, which can really only be found by tooling at runtime or by certain linting tools (e.g. warn that the type definitions make reference cycles possible, even if your code isn't making cycles). There are tools such as Instruments included with Xcode to help detect cycles at runtime.


I always love projects that build from nothing and go step-by-step to build something more complex. Reminds me of one of my favorite posts of a similar style building out "Metaballs" with different algorithms: http://jamie-wong.com/2014/08/19/metaballs-and-marching-squa....

Nice work!


> assuming the deletes are done appropriately

This is one gripe I have with soft-deletion. Since I can no longer rely on ON DELETE CASCADE relationships, I need to re-defined these relationship between objects at the application layer. This gets more and more difficult as relationships between objects increase.

If the goal is to keep a history of all records for compliance reasons or "just in case", I tend to prefer a CDC stream into a separate historical system of record.


> Since I can no longer rely on ON DELETE CASCADE relationships

Cascaded deletes scare me anyway. It only takes one idiot to implement UPSERT as DELETE+INSERT because it seems easier, and child data is lost. You could always use triggers to cascade you soft-delete flags as an alternative method, though that would be less efficient (and more likely to be buggy) than the built-in solution that cascaded deletes are.

If you look at how system-versioned (or “temporal”) tables are implemented in some DBMSs, that is a good compromise. The history table is your audit, containing all old versions of rows even deleted ones, and the base table can be really deleted from, so you don't need views or other abstractions away from the base data to avoid accidentally resurrecting data. You can also apply different storage options to the archive data (compression/not, different indexes, ... depending on expected use cases) without more manaully setting up partitioning based on the deleted/not flag. It can make some query times less efficient (you need to union two tables to get the latest version of things including deleted ones, etc.) but they make other things easier (especially with the syntactic sugar like AS AT SYSTEM_TIME <when> and so forth) and yet more things are rendered possible (if inefficient) where they were not before.

> I tend to prefer a CDC stream into a separate historical system of record.

This is similar, though with system versioned tables you are pretty much always keeping the history/audit in the same DB.

---

FWIW: we create systems for highly regulated finance companies where really deleting things is often verboten, until it isn't and then you have the other extreme and need to absolutely purge information, so these things are often on my mind.


> It only takes one idiot to implement UPSERT as DELETE+INSERT because it seems easier, and child data is lost.

Seems unfortunate to miss out on all the referential integrity benefits of a serious database when hiring standards, training and code reviews should all be preventing idiotic changes.

If I’m making a shopping cart system, I want to know every order line belongs to an order, every order belongs to a user and so on. Anyone who can’t be trusted to write an update statement certainly can’t be trusted to avoid creating a bunch of orphan records IMHO.


> Seems unfortunate to miss out on all the referential integrity benefits of a serious database when hiring standards, training and code reviews should all be preventing idiotic changes.

If you don't use ON DELETE CASCADE, the actual foreign key constraint gives you a meaningful error-- that you need to delete some stuff to have referential integrity.

ON DELETE CASCADE --- you're telling it "eh, if you need to delete some stuff to avoid an error, go ahead, don't bother me, do what I mean".


> Seems unfortunate to miss out on

You don't lose any referential integrity without cascades. Foreign keys are still enforced, just with an error if an action would break integrity rather than automatic deletes to satisfy the constraint that way.

> when hiring standards, training and code reviews should all be preventing idiotic changes

I was burned by this sort of thing early on, in companies where I had few such luxuries. Even though things are done better now, I'm still paranoid of that one day someone skips procedure and somehow lets a problem through all the QA loops.

> If I’m making a shopping cart system, I want to know every order line belongs to an order, every order belongs to a user and so on.

I take it from the other side: I want to know that if something is referred to elsewhere it can't be deleted until that is resolved.

If a top-level manager leaves I want an error if the hierarchy hasn't been updated before deleting his record¹, rather than his underlings, their underlings, their underling's underlings, … , being deleted when that one person is!

----

[1] Obviously this would normally be a soft-delete, there may be a lot of records referring to such an individual not just other person records. If you actually need to delete them (right to be forgotten etc.) then you need to purge the PII but keep the record so other things still hang together.


Often you don't have to rely on ON DELETE CASCADE relationships. Because you are never deleting anything, you will never have any orphaned records. If you don't want to see say Invoices for a deleted Customer then that's just another filter feature.

Mostly I use soft-delete because for auditing requirements we pretty much can't remove anything but also because nothing ever truly goes away. If we have an Invoice or Order then, from our perspective, we must have those forever even if the corresponding client is deleted and can never place another one.


> Often you don't have to rely on ON DELETE CASCADE relationships. Because you are never deleting anything, you will never have any orphaned records

Exactly. Unless you're doing something silly like adding deleted at to bridge tables ... which, you probably don't need even in 1:many.


You may end up doing this anyways if you have any application code that needs access to delete hooks, or access control varies across objects. At this point, you are probably using a ORM instead of direct queries, and place logic that could be in the db instead at the app layer.


Being unable to effectively use foreign key relationships is definitely a downside of using soft deletes. But it's also worth asking if these types of behaviors, which would also include a feature like triggers, really belongs in a database or whether it's better to have at the application level (or at least at a layer above the data layer). I'd argue that ultimately you probably don't want these things at the DB level because you get into a situation where you're sharing business logic between two (or more places).


My perspective is DB level triggers are the absolute very best place to put cascading update/delete logic so it only ever needs to be written once and is consistent regardless of any future frontend clients that might be written in a different language and/or framework than the original codebase.

Right now in $dayjob I am converting an old non-DRY codebase from NoSQL data layer format to proper relational SQL backend.

This old front-end was created by verbosely coding up a relational cascading update/delete system for the NoSQL backend, in numerous places redundantly with subtle differences and inconsistencies, making the code brittle.

My current estimate is some front end functions will be reduced in LOC size by 95% once we use the power of SQL in the backend.

And the backend SQL Triggers+StoredProcedure required to replace these long NoSQL frontend functions doing cascading updates/deletes is only around 10% the size of the replaced front-end code.

And now future new frontends can reuse this without exploding the size of their codebase where complex data operations are required. And no need to reinvent the same data handling algorithm all over again (and risk subtle variation creeping in from the different front-end implementation of data algorithms)


I'm less likely to use triggers, but I'll say I pretty much always want proper foreign key relationships set up in the database. Unique and other constraints too. In principal I might agree with you that it's an application level concern, but being able to setup these kind of invariants and trust that they will be enforced even in the face of bugs in my code (and there will be bugs in my code) is just too powerful to let go of in the name of purity. I'd much rather let a bit of business logic creep between multiple layers that discover that I have a whole bunch of orphaned records and no sensible way to reconcile them.


The DB's responsible for maintaining the integrity of data in it, unless there's some very good reason you can't let it do that. It's faster and better at it than your application, 99% of the time, and it can keep doing that even when/if a second application starts using the same database, or when being used from a SQL prompt, or whatever.


Presumably you have a schema defining the tables, the columns, and the types at the least, along with things like unique indexes. So you already have data constraints in your database design. And that's where they belong, to ensure the data integrity, since the database's concern is the data.

If you're doing everything as one big table of entity-attribute-value with generic blobs for the values, then yes you'll have to re-implement all the normal constraints (that the database would handle) in your application and do all your data integrity handling there. And you'll also have to duplicate that logic across every application that accesses that database now and in the future.

Data usually lives longer and has more uses than just one program. So I think it's generally better to put integrity constraints in the database, rather than having to re-implement and duplicate that logic several places.


If we're assuming you're using a view based approach which elides the soft deleted rows automatically then you'll get a lot of these dependent objects correctly updated for free assuming you're pulling them out of the DB with JOINs - SELECT FROM foo JOIN bar (assuming bar is a view into barwithdeleted) will automatically filter out the invalid rows from foo... if you're using this information to populate a CRUD interface it's likely you'll be JOINing bar already to get some metadata for display (like maybe bar.name instead of the surrogate bar.id key you use for joining).


Yes, but other queries (any aggregate queries that don't join the soft deleted table, any joins to other tables) will now return rows that would have been deleted under hard deletion with cascade.


This is definitely something to watch out for, but in practice (as someone that migrated a system without soft deletes to one that had them) I found that it doesn't tend to come up nearly as much as you might think - usually the table being transitioned to support soft deletes is a relatively core table (since ancillary tables usually aren't worth the complexity to transition) so a lot of your reporting queries will already be pulling in that table. You definitely need to check to make sure you're not missing anything - and sometimes CRUD interfaces will need to be completely revamped to include the table in question - but it's usually not that hard.


You could use a trigger to cascade soft-delete flag toggles, provided all the relevant tables have such a column. Still have to factor that into your other queries, but at least you wouldn't have to make potentially-impossible-to-exhaustively-check joins to figure out if a given row was "deleted".


Is not there any attempt to improve the soft deletion at the engine/SQL level? I can see it as a possible feature request.


If you're using PostgreSQL, you can implement cascading soft-deletes yourself.

The information schema table holds all foreign key relationships, so one can write a generic procedure that cascades through the fkey graph to soft-delete rows in any related tables.


Could someone take a stab at an example of what this would look like? Sounds really interesting.


Here's an interesting approach using rules: https://evilmartians.com/chronicles/soft-deletion-with-postg...



There's the idea of temporal tables, https://pgxn.org/dist/temporal_tables/

It's not a standard (I think) but it'd let you do a cascading delete and then be able to go and look at the old objects as they were at time of deletion too.

You'd need to do things very differently to show a list of deleted objects though.


> temporal tables … It's not a standard

They were introduced in ANSI SQL 2011.

How closely implementations follow the standard I don't know, but something close exists in several DBMSs: I use them regualrly in MS SQL Server, there are plug-ins for postgres, MariaDB has them, and so forth.


It appears that there's been an attempt at standardizing temporal features in SQL in the SQL:2011 standard: https://en.wikipedia.org/wiki/SQL:2011


Neat I didn't know that had happened. I can't say I follow SQL standards all that thoroughly.


You lose traditional FK constraints with temporal tables since there's multiple copies of a row. One workaround is to partition current rows separately from historical rows and only enforce FK constraints on the current partition.


One interesting feature that some DBs implement is something like SELECT AS OF SYSTEM TIME (https://www.cockroachlabs.com/docs/stable/as-of-system-time....) which _kinda_ does this.

However in practice this usually dramatically slows down reads if you have to constantly skip over the historic rows so you probably don't want to keep garbage round longer than absolutely necessary. The concept of a historic table mentioned below could be interesting though - especially if it could be offloaded to cold storage.


Doubt it. It seems like something obvious yet I’ve waited so long for it. Seems like you have to rely on third party plugins.


> This is one gripe I have with soft-deletion. Since I can no longer rely on ON DELETE CASCADE relationships

If you use soft deletes on all tables, you can also cascade them as long as you either cascade updates to the real keys as well, or prevent such updates, by having a deleted flag column on each table, including it in a unique constraint with the actual key column(s), and including it in the foreign key.


Funny seeing this pop-up on my #hacker-news slack channel that periodically polls for the top stories :)

I actually prefer the ability to push things like the top stories of the day to me through either Slack on email rather than having the temptation to constantly refresh an RSS feed app. An added bonus is that I frequently add some custom logic to curate the feed to my liking (based on the feed).


I think it was a reference his first tweet.

(https://twitter.com/jack/status/20)


It works (kinda) with Karabiner Elements: http://www.jeffgeerling.com/blog/2016/remapping-caps-lock-ke... -- Edit: but it's very nice to see this supported by default :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: