I just feel this is a great example of someone falling into the common trap of treating an LLM like a human.
They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.
All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)
I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)
I meant that frame very deliberately. Use of the word AI is misleading people that LLMs are intelligent.
They model what looks like intelligence but with very hard limits. The two advantages they have over human brains are perfect recall and data storage. They are also faster.
But the brain is vastly more intelligent:
- It can learn concepts (e.g. language) with an order of magnitude less information
- It responds in parallel to multiple formats of stimuli (e.g. sight/sound)
- LLMs lack the ability to generalise
- The brain interprets and understands what it experienced
That's just the tip of the iceberg. Don't get me wrong: I use AI, it is by far some of the most impressive tech we have built so far, and it has potential to advance society significantly.
But it is definitely, vastly, less intelligent than us.
The blog frequently refers to the LLM as "him" instead of "it" which somehow feels disturbing to me.
I love to anthropomorphize things like rocks or plants, but something about doing it to an AI that responds in human like language enters an uncanny valley or otherwise upsets me.
From the CDC report [1], it's pretty clear that rabies was not considered for the donor until after the donee died and rabies was confirmed. Possibly because the donor had been scratched by a skunk and not biten. The report says the scratch had been noted on the donor risk assessment interview (DRAI), but that skunks are not considered a reservoir for rabies in his area.
If a manager is handling (almost) all disputes of all sorts, then they will fundamentally lack authority to enforce an outcome on a real dispute. They simply are too involved because resolution requires you to take some sort of side.
If my children won't speak to each other I will refuse to be the go between because I become a proxy for one to the other. If one then punches the other they won't respect my perspective that this was wrong because I've set myself up as the proxy for the others feelings.
If you need a manger to resolve the above example, the org is broken and the engineers are poor engineers.
> If a manager is handling (almost) all disputes of all sorts, then they will fundamentally lack authority to enforce an outcome on a real dispute. They simply are too involved because resolution requires you to take some sort of side.
Bullshit. Being a routine mediator makes you a better mediator when big things come up, not a worse one. It means you are in tune with the particular needs and idiosyncrasies of the people involved, and assuming you are any good at it, it means you have the trust of all parties to mediate fairly.
> If my children won't speak to each other I will refuse to be the go between because I become a proxy for one to the other.
First of all, managing adults and parenting children are two radically different things. Second, being a go between is not handling a dispute, if anything it facilitates the dispute. Kids can't agree on whose turn it is to play with a toy? Toy gets taken away with the understanding they'll get it back when they agree to a system - that's conflict resolution.
> If one then punches the other they won't respect my perspective that this was wrong because I've set myself up as the proxy for the others feelings.
What?
> If you need a manger to resolve the above example, the org is broken and the engineers are poor engineers.
The fact there is this conflict to resolve is evidence that the org is broken and the engineers are poor engineers, but given that there is a conflict, the manager should be the one resolving it, because, again, that is their job.
You may not mean it but I do think sometimes framing it this way implies leading and managing is something that requires less ability (it's a skill in its own right).
What I think is true is people cap out their technical competency, and look to shift their skillset and, globally, we are bad at a) training them to be good managers (because there is a wrong assumption it's an innate skill) and b) weeding out the many who also lack the ability to be a manager.
Agree, it’s a skill, it can be learned and improved, and of course some people have some natural ability.
But for every skill there’s a floor and a ceiling. The floor for managers is imo far lower than it is for tech ICs. Incompetent managers have many options to hide their misdeeds. That doesn’t say anything about the average or the ceiling.
I suspect this is written by someone who stepped into managing a team and no further.
My reflection overall is; he's probably heard of servant leadership but not understood it? It's not about sweeping away problems but more a mindset that your role is to empower. I feel strongly that all new managers should embrace and get good at this because it instills the mindset that the best leaders ultimately only succeed through their team.
A servant leader who becomes overworked is either not doing their job well (delegation isn't contrary to the mindset!) or, more likely, has a poor leader themselvesw.
I actually love the concept of transparent leadership but sadly I can't see it come through in his points. They are all things a good leader, a good servant leader, should also do.
For me transparent leadership becomes more critical as you move up the stack. Once you get to multiple teams or teams of teams leaders must pivot strongly to strategy setting, and in this your servant leadership comes in painting a clear destination for everyone to get to.
At this point I believe the best leaders are genuinely transparent and the worst keep secrets. One of my most respected mentors framed it as deliberately over-sharing. Which I love, even if I get into trouble for it constantly!
(I do like the writers anarchic streak; the best leaders are radicals)
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.
The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.
I wouldn't call that accidental complexity? It's just a set of preferences.
Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.
The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?
Also remember this was a simple example; once we get to real business logic efficiencies grow.
It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).
Something boring and standard is something that keeps going with minimal intervention while getting better each time.
I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.
I'm struggling to see what you'd choose to do differently here?
Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!
Your card doesn't know the balance, it doesn't work like that.
Offline transactions mostly died off when the limit in the UK for contactless was raised to £100. At £20/30 (the original limits) issuers/merchants risk accept some payments not being valid (and the total limit before you had to chip and pin was fairly low top).
And worth saying, the merchant has some control on the terminal but mostly the decision of offline/online is down to the issuer and configured on the card.
Some debit cards don't allow offline transactions, usually when the cardholder isn't allowed to be in debt.
In the olden days, you'd get a Visa Electron or Solo debit card in the UK if you were under 18 or had a poor credit history.
Visa Electron and Solo were online authorisation-only card brands (also known as "immediate authorisation").
If you didn't have enough money in your account, the transaction would be declined. Visa Electron cards didn't have embossed numbers on the front, so couldn't be used with the old-fashioned card imprinters.
Visa Electron and Solo have been discontinued now, so people with poor credit can get a Visa Debit or MasterCard debit card, but with offline authorisation disabled.
That does mean those cards can't work in some places (e.g. on aeroplanes or trains).
Credit Cards generally always support offline authorisation.
They use largely the same rails/network (for example Mastercard). The only meaningful difference is on how and when funds are reconciled.
Some payment providers ask up front to simplify the flows as it's not totally trivial to determine what sort of card it is, and also because different fees apply - historically some merchants added specific fees to basket etc. (less so nowadays but the UI convention sticks)
> Some payment providers ask up front to simplify the flows as it's not totally trivial to determine what sort of card it is
And because the same card can be both. At least here in Brazil, most bank cards have multiple uses (credit, debit, ATM) in the same card. AFAIK, they're separate applications within the same chip, and the terminal has to select which one to use before starting.
Interesting! Did not know that offhand but just looked it up in the technical docs and this is part of the standard. Interesting to hear how other countries have adopted different approaches.
From memory, online and offline transactions are usually split out by BIN number (first six digits)
The BIN will tell you which bank was the issuer and which class of card you have, like standard or premium, though most readers probably don't take that into account beyond the card scheme and card type associated with the range that the individual BIN is in. Many banks will have multiple BINs for the same card type if they are large.
Credit / online debit / offline debit usually get different ranges. The reader gets a list of the ranges when it updates and they don't change super often. Offline readers can be configured to reject cards with a number in an online only range.
It's usually based on the chip settings. Rules aren't as simple as "always online" or "never offline"; an issuer can e.g. convey that they'd prefer online transactions for certain types of payments, while offline is ok for others, via relatively complex configurations of the code of the chip application.
Before that, there was the service code on the magnetic stripe, which also can convey things like "online only" or "domestic use only".
The BIN is only involved in risk management on the terminal's side: Many of these in-flight terminal accept deferred online transactions, which means that, even though they're completely offline, they take the risk of accepting an online-only card. (For truly offline capable cards, the risk is often with the issuing bank.)
That type of risk management can benefit from knowing what type of card it is, and prepaid cards are often seen as riskier (because customers might intentionally drain them before a flight). Of course, debit and credit cards can also be empty/marked as stolen, but these are marginally harder to get and replace.
Yep you are completely correct; people don't realise how complex the chip is - it has what you'd legitimately recognise as an operating system! It can also be reprogrammed over the wire, if your chip and pin is taking a bit toooo long that might be what's happening.
Your correct on the risk spread. I wasn't confident last night (I'm not totally versed on the terminals) but looked it up. As I understand if you choose to accept offline only payments then you accept the risk of the transaction failing. If it's the issuers choice they own the risk.
> The only meaningful difference is on how and when funds are reconciled.
Nope, even this is identical. These days the difference between a debit/credit card is pretty much aesthetic, from a transaction processing perspective there generally isn’t any actual differences. Differences that people see today are most artificial for the purpose of justifying extra fees, or higher interchange based an entirely arbitrary factor that has zero correlation to any risks that appear in the transaction processing and clearing mechanisms.
Basically the only reason anyone really bothers keeping the difference between credit/debit cards around, is as a technical excuse for discrimination and abusive fees. Notably in the EU nobody cares if a debit or credit card is used, because the EU outlawed all the crazy fees and other bullshit, so now there’s no commercial reason to differentiate between the two 99% of the time.
There are a few differences for sure. All entirely technical in how the money moves or clears. The most obvious point here is debit card moves your money from your account, credit moves the issuers money from their account.
But to your wider point; from a transaction fee point of view you are dead right. Of course a credit card has other attractions; for example it's credit :D but also things like section 75 protection.
> There are a few differences for sure. All entirely technical in how the money moves or clears. The most obvious point here is debit card moves your money from your account, credit moves the issuers money from their account.
From the perspective of the card network and the merchant, there is no difference here. The card network has a contract with the issuer, so all transactions, in all scenarios, are always first paid by the issuer. It’s then the issuers problem to figure out where they get the money from.
It’s entirely possible to perform transactions on debit card that will place the account attached to it in a negative balance, and for the person owning that account to vanish. The card issuer is still on the hook for the money, neither the card network, nor the merchant, care if the issuer recovers the funds or not, they always get paid.
But there is a lot more complexity than, I think, you are glossing over. For example, you also likely have at least one technical services partner in the flows, probably two.
Additionally, money often doesn't move in real time, especially when credit cards are involved. The process is, intentionally, split.
Your point on that is fair, but remember, many credit providers are also not banks, and the money is in a bank account owned by a third party. So, as a trivial example, I can't just assume money coming to me from Bank A is related to transactions from Bank A's cards.
A lot of people don't realise that the main way all of this works is through very large batch files with lists of transactions in moving back and forth between various parties behind the scenes.
(We are on semantic points, though, but I just wanted to clarify the complexity behind the scenes that most people don't see or understand)
They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.
All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)
I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)