Posting this because it's a genuinely fascinating look at industrial scale account fraud infrastructure. The €5M in direct losses is notable, but what really caught my attention is how polished and accessible they made the whole operation.
The FaaS angle is what gets me. They weren't just running a SIMbox farm in a basement they had public websites, API documentation, and were essentially selling "bypass SMS verification as a service" to other criminals globally. That's a business model. That's engineering.
The scale is where it gets sobering. 1,200 SIMbox devices, 40,000 simultaneous SIM cards, 50 million fake accounts. That's not amateur hour. That's logistics, infrastructure, customer support. They'd solved the hard problems: How do you manage hardware at scale? How do you keep 40,000 SIMs operational? How do you make it easy enough that non-technical actors can integrate it into their fraud workflows?
And yeah, the systemic stuff is the real problem. We're all operating under the assumption that "phone verified" means something. It doesn't anymore, apparently. All those metrics everyone relies on user growth, engagement, review scores there's just... noise in there. A lot of noise.
Makes you wonder: for every verification layer we add, is there already a service like this being built to defeat it?
This is exactly what we've been seeing across the board. The data gap isn't just a tracking problem it's a cascading failure in the systems we depend on to grow.
A 30% data loss doesn't translate to a 30% revenue hit because ad platforms can't optimize blind. Your conversion API gets starved of signals. Meta's algorithm can't exit the learning phase. Google stops allocating budget effectively. Your retargeting pools shrink. Your LTV calculations become fiction. Bad decisions compound from there.
What gets me is how invisible it all is. The dashboards look fine. The numbers are there. But they're incomplete in exactly the ways that matter the conversions that train your optimization models are missing.
I've talked to agency owners and in-house teams about this. Everyone feels it: the growing disconnect between what they're doing and what the data shows. It's become a quiet frustration across the industry.
The browser wars and privacy laws were inevitable. But the cost of this transition is being borne entirely by businesses operating in the dark. The question now is whether server-side solutions buy us enough time before the blocking arms race catches up.
After posting about 73% bot traffic statistics, I kept getting the same question: "What does this actually look like?"
So here's the answer. Actual click farm footage. Hundreds of phones running 24/7 automation scripts.
Each device simulates 10-20 "real users" with unique IPs (residential proxies), different device fingerprints, and varied behavior patterns. Your analytics can't tell the difference. Neither can Google or Facebook's fraud detection.
The scary part? This is probably a SMALL operation. Some farms run 50,000+ devices.
At DataCops we're researching network-level detection because traditional methods are failing. When you see 500 "different" residential IPs with identical TCP patterns and synchronized timing signatures, that's not 500 people it's orchestrated fraud routing through compromised residential connections.
The real damage isn't just wasted ad spend. It's business owners making terrible decisions based on corrupted data. They see traffic but no conversions, so they conclude their product sucks, their pricing is wrong, their website needs redesigning. They change everything trying to "fix" their conversion rate when 70% of their traffic was never human to begin with.
How many businesses have failed because they were making strategic decisions based on fundamentally fake data?
Hi HN,
I posted yesterday thinking nobody would care and was ready to just leave it alone. Waking up today to see all these comments absolutely shocked me.
Here are answers to the main questions I'm seeing:
Regarding the writing style: Look, I used AI to edit the article for clarity and professionalism only for the blog. This is a business website, not a personal blog. If you want the raw, unedited version, check this Reddit link:
I find it ironic that some people are more focused on criticizing the writing style than addressing the actual problem I'm documenting.
About sharing the script: Sophisticated bot detection can't rely on Microsoft Clarity or traditional analytics alone. It requires multiple data points:
Modern bots use specific behavioral patterns at scale with customized browser configurations (not just basic headless browsers). You need to understand frameworks like Puppeteer, Playwright, and Selenium very well to detect them.
You need access to live global datacenter IP databases.
Even when click farms use real phones with real browsers, IP intelligence helps distinguish human from automated behavior. Each device needs a unique IP, when it runs its target activity to avoid fraud detection.
Detection involves analyzing dozens of factors simultaneously.
The data in this article is real. I stand by the findings.
Hi HN. I run a marketing agency and fell down this rabbit hole after a client's analytics made no sense (50k visitors, 47 sales). I ended up building a simple script to track user behavior and analyzed 200+ small e-commerce sites. The average was 73% bot traffic that standard analytics counts as real.
The bots are getting creepily good at mimicking engagement. I wrote up my findings, including some of the bizarre patterns I saw and the off-the-record conversations I had with ad tech insiders. It seems like a massive, open secret that nobody wants to talk about because the whole system is propped up by it.
I'm curious if other developers, founders, or marketers here have seen similar discrepancies in their own data.
This is essentially fraud. Your company was made aware that it was selling a product with wildly different characteristics than advertised, and chose to cover it up.
There are defensible business reasons for this, in having a contract already in place at the old CPM, so being unable to double the CPM and half the views mid-contract... but still pretty much fraud.
Sounds like even more fraud. Here you can apply the new filter to historic data or create a new metric version and keep the old version to avoid breaking continuity in what's measured.
My impression is that the question is whether “business” in this instance refers to the Yellowpages company itself, or the companies that make up their customers base.
Way back when I work in an ad based company, click fraud handling was under my overseeing. We caught about 20 percent of clicks as fraudulent and filtered them out before billing the ad placing vendors. It was a constant battle with the sales team to relax the rules, as any clicks filtered out cut into the sales revenue. Sometimes we got the customers on our side as they ran their own analysis on the billed click report and came back demanding refund as they found a bunch of fraudulent clicks.
Yeah, the incentives there are obviously misaligned. I wonder if there is a potential way of making advertising click through tracking following the "I cut the cake, you chose the slice" model.
Some countries have property taxes where you declare the value and the government retains the right to purchase the property for that value for example.
My first thought was to make the advertising cost driven by revenue on the site. But that just reverses the incentive.
People will just pull ads if the ROAS isn't there. Performance marketing teams aren't fools.
Altering data would mess with everything. Why is unverified traffic increasing? What's wrong with new marketing efforts? Marketing just requires fixed definitions. e.g. if you have 97% bots but it remains constant that's okay. I know I am spending $x to get $y conversions. I can plan over time, increase or decrease and I can plan. I won't be willing to pay as much as with 0% bots (will pay far far less) but I can set my strategy on this.
It's not that it's x% bots that is the problem. Growth team doesn't adjust strategy on percentage-bot. Growth team adjusts strategy based on return on ad spend. If 0% bots but no return, way worse than 5x ROAS with 99% bots.
In others you'd want, say, auditing or independent third-party verification.
In this case, perhaps an audit involving the deliberate injection of a mix of legitimate and bot traffic to see how much of the bot traffic was accurately detected by the ad platform. Rates on total traffic could be adjusted accordingly.
This of course leads to more complications, including detection of trial interactions, see e.g., the 2010 VW diesel emissions scandal: <https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal>, or current AIs which can successfully identify when they're being tested.
On further reflection: I'd thought of raising the question of which situations cut/choose does work. Generally it seems to be where an allocation-division decision is being made, and the allocation is largely simultaneous with the division, with both parties having equal information as to value. Or so it seems to me, though I think the question's worth thinking about more thoroughly.
That's a subset of multi-party decisionmaking situations, though it's a useful one to keep in mind.
I vaguely remember someone winning a noble prize for economics for coming up with ways to apply cut/choose in financial transactions but I couldn't find it in a quick Google. It may have been nearly 20 years ago though.
Basic ad ops has ad buyers buy ads from different vendors, track which converts (attribution, which has flaws, but generally is decent signal), and allocate spend via return on ad spend. So it hurts the vendor at least as much as the buyer by inflating the cost per action / damaging roas.
I've seen people but cpc campaigns and only place ads that don't convert. So they get the benefit of the branding instead.
I guess more modern auction algorithms factor this in
Makes me think of the recent thing where YouTube stopped counting views with ad blockers. The Linus Tech Tips people worked out they were still seeing the same ad impressions and revenue despite views dropping by half. Unfortunately, sponsorship deals are often sold on viewership and I'm not sure even LTT has the clout to convince them nothing is functionally different.
That's splitting hairs, but in a way that's important to the conversation.
If YouTube served gigabytes of the video file for 40 munutes and human watched it for that time, but they didn't send a request to `youtube.com/api/stats/atr/` and periodically to `/stats/qoe?`, did the video actually get viewed?
I think a reasonable person would say that the person viewed that video. Only a programmer would suggest that wasn't a view because they didn't properly engage the statistics and tracking endpoints.
But so much of the industry is built on deeply invasive tracking and third-party ad networks that this is a normal thing.
Youtube might not really care about accurate view counting in that way. In fact, Youtube likely does not care at all about traffic that won't view an ad, and they have demonstrably been hostile to that traffic for a while. Someday youtube hopes that nobody with an adblocker ever views any video, and has no intention to track their view attempts.
If that causes a problem with Youtubers making a paycheck from external sponsors, Youtube really really does not care, because sponsorships are money that youtube doesn't get!
Youtube is downright hostile to creators who don't make the "right" content, which means new videos at a perfect schedule that all have the exact same "taste" such that viewers can mindlessly consume them in between the actual ad breaks youtube cares about. The more and faster people accept this, the more likely we get improvements.
It took over a year for youtube to agree to let people swear in videos without punishment, including videos that are set as "These are not intended for children", after they unilaterally introduced this swear downranking several years ago.
Youtube cares about Mr Beast and that's about it. If you do not run your channel like Mr Beast, youtube hopes you die painfully and think about your mistake. Youtube actively drives creators making actual art, science, knowledge, and other types of videos to burnout, because Youtube considers creators to be a renewable resource worth exploiting, because there will always be 15 year olds who want to become influencers.
It is not "deeply invasive tracking" or "programmer thinking", it's entirely business. Google's business is ad views, not video views. They want to measure what they care about
I think most websites would break if a 3rd party script started blocking things. Theres also the fact that view tracking is fairly complex since they need to filter out bots / ad fraud.
And just the fact that if users have a privacy extension blocking the view tracker, is it not just respecting their wishes to not be tracked?
>> In less that a day business mandated us to remove the filter.
Did something similar at a small company I was working at. The VP of marketing sat me down and told me to do the same thing.
After the meeting, I was told by another dev that the VP was tying a monetary value to specific clicks and if I was filtering out the bots, it would make his data look bad and reduce the amount of potential revenue for the company he was touting.
I think you can see how the bots were actually helping him promote how awesome a job he was doing with our web properties to the owners.
He idea of "reality" and how things work, even if the idea is "they work bad", is more important than any other argument or reality. That's true at personal level and even truer at any organizational level.
Well yes but seems business has decided truth is worth less than money. Or now that i think of it, everything is worth less than money. Even, money is the only thing worth anything. They don't care about people, pride, products, or truth.
I think the way it goes is more "What is true? It's that if I bury this truth, I will have more money."
Interestingly, I think money is increasingly its own falsehood now. A lot of rich people are finding that they pay a lot to get what's basically a scam, like Sam Altman's swimming pool that leaked and ruined the rest of the house [1]. There's a reason that Billionaire's Bunker [2] is entering the cultural zeitgeist despite fairly terrible plotting, dialogue, and acting.
Half my advertising budget is wasted, I just don't know which half...
In many corporate cases vague metrics meets the KPI better than accurate ones
I worked for one of the mag7 doing customer support bot tech. Clients internal metrics around containment consistently showed better results than ours - even though you'd normally expect them to be putting pressure on their vendor. because it was a kpi for their internal team to look good to their bosses
That is worse. Now you have a bunch of companies wondering why their engagement is falling over a whole year, some dude's getting fired for not doing his job, etc. etc.
The correct thing to do, probably, is to just provide the new data to the customer without changing what they were already looking at. So a new widget appears on their dashboard, "52% bot traffic", they click on that, and they see their familiar line chart of "impressions over time" broken down as a stacked line chart, bottom is "human impressions over time," top is "bot impressions over time," and the percentage that they were looking at is reported either above or beneath the graph for the same time intervals. Thus calling attention to the bottom graph, "human impressions over time," and they can ask your sales people "how do I get THAT number on my normal dashboard?" and they can hem and haw about how you have to upgrade to the Extended Analytics Experience tier for us to display that info and other business nonsense...
Point is, you stimulate curiosity with loud interference rather than quietly interfering with the status quo.
With numbers like, if it scares customers a good strategy is to implement the filtering very gradually over time, over 6 months or a year. The fall off is way less scary, and it can be described as improving bot filtering.
It’s not as honest, but more palatable unfortunately
Does it really matter if it's all fraud? You track 47 sales over some period. What was the ad spend for that period? Combine that with previous data and that should be enough to figure out if it was a successful campaign or not.
When a company puts up a billboard or an ad on the bus, they don't care if the ad is seen by dashcams and dogs. All that matters is impact on the bottom line.
If you could pay for web ads in the same way you could pay for a billboard (flat rate per period of time) yeah that would help. if you pay per impression or view or click you have some other issues
When you talk to Google sales they want to know how much you make per customer, so theyre able to figure out exactly how much they could charge to take 80% of the margin. The cream on top is taken by Google in the form of fake bot clicks.
IF you have things right - which Google has incentive to see to it that you don't - this makes no difference. Because the real measure isn't how much you make per customer, (including those who come by other means) - it people who didn't click on the ad even though seeing it was a part of their decision to buy (they saw the ad on their phone and told their wife who bought it). If you are doing proper add effectiveness measurement you get this data (it is messy data).
Remember what matters is how the ad affects the bottom line. Everything else is just a proxy - you need to check to see if your proxies are good enough.
This is the key point. Ads and clicks etc are priced in a competitive market. If they don’t deliver the ROI because of bots, then people (including the allegedly hopelessly confused e-commerce retailers) would pay less for the same amount of traffic. It may be annoying (and the cost of dealing with that annoyance would further drive down the price paid for the traffic). But what matters is that an e-commerce site is profitable (enough) after the ad spend, period. If they are not, why do they spend what they spend on the ads?
There are certainly competition concerns about Google's advertising programs.
But within their advertising market, you compete for placement with other advertisers. If everyone is getting lots of fraud traffic, presumably they adjust their bids for it, if you're getting outbid consistently, it's reasonable to expect that the other advertisers are either getting a better ROI or they have a lower ROI target than you do.
About a million years ago, I was on a team that had a significant ad program, and it was primarily data driven, we'd come up with keywords to advertise on, measure the results and adjust our bids. With a little bit of manual work to filter out inappropriate keywords or to allow a lower ROI on "important" keywords. Of course, our revenue was largely also from advertising, so it was a bit circular.
At google I would expect no. However I don't understand why other "portals" are not running their own ads (some are of course - but I think more should). If you are a portal your value is the eyeballs you sell to advertisers, so why are you out soucing this critical part of your business value? This needs to be a core competency you keep in house.
It makes optimizing your ads significantly harder. Imagine trying to understand traffic flow on a freeway when 99.9% of cars are just projected illusions. Not fun.
Effective advertising depends on iterative testing, which is very hard if the signal to noise ratio is way off.
It depends on whether the ad engagement / views were real people or bots - if you're a competitor, then using bots to use up your competitor's ad budget is a strategy.
If you display ads on your website, then sure, it doesn't matter if a bot or real person viewed/clicked on your ad if you get paid regardless - to the point where there's a big industry of putting ads on sites then having bots engage with the site and its ads.
Billboards are a bit different because that's paid per week / month, but internet ads are paid per impression or click.
You aren’t paying for conversion rate, you are paying for a link being put on a website when a query is made. You can’t control whether a bot follows that link. You can’t control how sophisticated that bot is. You can’t expect an advertiser to filter out every type of illegitimate traffic (although it sounds like they probably have the capability to filter out more but don’t have any incentive to do so).
I have seen recommendations from across the Internet to not bother with Google ads and other similar paid ad services. It’s basically like paying for a cold lead, you’re attracting one of the least interested types of customers.
The recommendation I’ve always seen is that it’s better to build legitimate interest in your product by producing content. Or perhaps move to an advertising platform where there’s more of a guarantee of reaching human users.
But still, I’ve heard that trying to spend customer acquisition dollars on one-time purchases is a losing battle.
If Tesla was able to start a massive car company without buying ads you can go without AdWords, too.
>I have seen recommendations from across the Internet to not bother with Google ads and other similar paid ad services. It’s basically like paying for a cold lead, you’re attracting one of the least interested types of customers.
Google Ads used to be very effective. You are catching the punter as they are actually looking for a solution to their problem. However Google have inflated bids and increased the complexity to the point where very few people can make a decent return now.
>The recommendation I’ve always seen is that it’s better to build legitimate interest in your product by producing content.
That is becoming rapidly less true as AIs steal all the traffic.
>If Tesla was able to start a massive car company without buying ads you can go without AdWords, too.
Yes it matters because Google ad spend is just one way to market and it's harder to attribute sales if there is a lot of fraud making it more inefficient.
You may be able to cancel out the numbers if the rate of fraud to real clicks is stable.
But it's incredibly unlikely that the amount of fraud is stable, any attention from a small player can generate enough volume to eclipse all of the background. Thus no, you can't just account for it on your calculation.
When a company puts up a billboard, it doesn't get taken down because a flock of birds passed by before anyone could look at it.
The pricing of a billboard is based on projected impressions but is decoupled from the billboard’s actual impressions. You pay the flat rate based on projected impressions. It doesn’t subsequently matter if an army of robots passes through and views the billboard 10,000 times, because the original projected audience should still see the billboard. You’re getting what you paid for. It’s also possible the projections are erroneous, fraudulent, or that the actual impressions just don’t follow the projected impressions for some other reason. This is the advantage of advertising online—if you pay for 10,000 views, the ad runs until you get them.
Contrast the billboard with an online ad: If you pay for 10,000 impressions, and 9,000 of those impressions end up being automated traffic, then you’ve effectively lost 90% of your advertising budget. Like you mentioned, ad buyers can and do account for this, so it’s not clear that it’s as big of a problem as this thread is making it out to be. The theory I see thrown around is that Meta and Alphabet are tacitly allowing this bot traffic to continue (or even that they are responsible for it themselves) because it allows them to sell another round of ads, but if this is what is going on it isn’t clear why firms would keep buying ads that aren’t converting.
yes, the co is likely spending big money on retargeting to bring back a potential customer to sell what the bot clicked on to generate retargeting. it's fraud and costing the company money.
If you get 47 sales on $10k in ad spend (pay per click) and $9900 of that $10k was fraudulent then you got 47 sales on $100 of ad spend. Imagine if you could stop those fraudulent clicks.
You can't just "ignore" fraudulent clicks as though those aren't baked into the original price. The only thing you should care about is your own sales vs your own ad spend.
Your calculations make as much sense as pouring out 90% of a can of beer, claiming you have just made vodka, and simultaneously trying to pay only for the 10% of beer that is left.
They are not your problem. You paid $10k for 47 sales (that is the original example numbers) - is this a good value is the question you need to ask. If this is not good value then you don't advertise there at all, if it is you pay the price.
It is the ad network's problem because they are showing ads to all those bots which costs them. It is also the ad networks problem in that they are not effective - maybe for your $10k for 47 sales is good enough, but for most it isn't and so they lose customers who pay attention to value. It is also their problem in that by giving a false number of views they lose some face, but this is always a proxy that isn't of much value to anyone.
The more important thing that is your problem is verifying the ads are worth it. How do you know the ad resulted in 47 sales. A large number of adds should be buy this in the future not today, and thus clicks are the wrong measure.
In my extensive experience, you take irrefutable evidence back to the ad platforms and they provide 'make goods' after the fact for those fraudulent visitors.
The entire premise of the article is quite accurate and most companies are spending money on third-parties to do what this guy has done to recoup fraudulent spends; so, none of this is new to the industry. Everyone knows it's happening and it's all part of the gig; it benefits everyone except those spending so little is truly done to correct it.
> The only thing you should care about is your own sales vs your own ad spend.
You do understand that when I stop/reduce the fraudulent clicks that leaves extra ad budget available to be spent by clicks from real people who will actually buy things.
I have 0 incentive to not work to stop/reduce the fraudulent clicks. My ad spend stays the same but my sales go up.
And imagine how much more you’d have to pay for each of those clicks if everyone could stop those fraudulent clicks. In equilibrium it shouldn’t change the total ad spend.
That’s true. But you probably can’t. At least any more than others. It’s a systemic issue in the ad network ecosystem which you don’t have much control over. If you can figure it out, odds are lots of others can too. People do assess the quality of traffic sources and do check the return on ad spend. It’s that system wide process that keeps the return on ad spend roughly constant.
The point here, for me, is that a microeconomic perspective on this whole question is more salient than a purely technical one.
I am fine with spending $10,000 on ads (or whatever amount).
The issue is that when I know $5,000 of it was spent on clicks that had 0% chance to convert. For every fraudulent click I can block/prevent that is one more click that can be made by a real person who may actually make a purchase.
You should have a good idea. How to know if an ad reaches people had been extensively studied for a lot longer than the internet even existed. Newspapers don't have clicks, but you still need to know if you ad works. Even on the internet, a large part of the value of ads are see the ad, buy latter without clicking on the ad. We can track this: do it.
The point is, if $10k brings in 47 sales and that’s not enough then you stop buying those ads. It doesn’t really matter why it’s not working. You take your marketing spend elsewhere.
You can’t stop fraudulent clicks just like you can’t stop your SuperBowl ad from playing while your viewers are in the bathroom. How much of ESPN’s viewership happens at bars where nobody is watching?
At some point it’s not reasonable to expect ad networks to be able to stop sophisticated bots or exclude them from your billed impressions.
They should definitely try to minimize it if they want to maintain the value of their impressions but I think there is a good argument that OP just isn’t the right customer for this type of advertisement.
If you’re trying to sell a t-shirt you don’t hire a salesperson to cold call people, maybe OP shouldn’t be using web ads in the first place. If fraud was cut down by half would their situation really be that much better?
If you think the ads are working and have 10k potential customers then you start thinking about how to increase your conversion rate thinking you could get a chunk of those 10k, you might think distribution is solved.
But if it turns out only 2.5k are real humans then your conversion rate might not even be an issue and it’s just the marketing strategy that needs tweaking.
The whole point is that they are giving you fraudulent traffic which you use as real data to figure out the next steps. If you don’t know it’s fraudulent or how much of the clicks are fraudulent then you are taking decisions under the wrong assumptions.
> You can’t stop fraudulent clicks just like you can’t stop your SuperBowl ad from playing while your viewers are in the bathroom
That’s not even a good analogy, we are taking clicks, not impressions.
I find it really interesting when you describe “good bot” traffic.
> During my investigation, a source from the e-commerce data industry provided a crucial piece of the puzzle. He explained that his former company was responsible for scraping 70 million retailer web pages every single day. This is a legitimate and massive source of automated traffic.
> Why do they do this? For vital business intelligence. Major retailers like Amazon do not always notify vendors when they run out of stock. So, brands pay for data scraping services to monitor their own products. These "good bots" check inventory levels, see who is winning the "buy box," ensure product descriptions are correct, and track search result rankings. They even scrape from different locations and mobile device profiles to analyze what banner ads are being shown to different audiences.
I think a lot of the players involved in this would say those are bad bots. Having your competitors scrape your site for data would probably be something most website owners wouldn’t like, but getting data about THEIR competitors would be something they WOULD like.
All bot traffic is good from SOMEONE’S perspective, otherwise it wouldn’t be happening. Someone had to program the bot, someone has to be running it. They obviously think they are good bots.
The people running AI scraping bots think they are good bots. Many content creators think those are bad bots. Price comparison sites scraping retailers think their bots are good. The sites being scraped often think they are bad bots.
I just don’t think we can clearly separate good and bad bot traffic without specifying whose perspective we are talking about.
"Good bots" would be ones that fuel a healthy, competitive economy. Whether or not they're good from the perspective of a retailer, data about products and supply chains is important.
"Bad bots" would be ones that primarily support grift. They siphon money from advertisers, or get other people to waste advertising $, or game product listing metrics so that the bots' owners products get better placement, or so that other manufacturers' products get worse placement. They're not creating any new value; they're removing value from the economy by causing bad allocation of resources.
I can see many bots falling into the two categories you describe, but I can also see many other bots that would not fit easily, and whose value is debatable.
For example, bots that scrape content for AI training; is that good or bad for a healthy economy? AI can be productive, but is it ‘stealing’ other’s productivity, which could hurt in the long term if it causes decrease in future human production because of diminished rewards?
"Good" vs "bad" was not intended to be an existential question about whether some bot-powered activity like AI training will be a net good or bad 20 or 100 years from now.
It's not even a question of whether, in that particular case, AI will displace some forms of human labor, even if that causes a macroeconomic crisis where society needs to figure out a better way to operate in light of new AI-driven economic realities.
It's a simple question of whether internet bots are causing economic inefficiencies and mis-allocation of resources in the classical sense.
The bad bots, discussed in the article, are. AI scraping bots are not. All it takes to see this are all the mentions of fraud in this HN thread. Bad bots are, at their core, fundamentally fraudulent in how they interact with sites. That fraud is compounded into how companies realizing the extent of that fraud react: by hiding or lying to investors and boards, because the fraudulent bot activity has significantly altered market perception. Nothing about pure scrapers is fraudulent, regardless of whether their ultimate goals are good or bad for the economy or humanity, if that could even be known.
I do web analytics consulting. One of my first projects at a digital marketing agency in 2021 was investigating weird traffic patterns for a global logistics firm. The findings are summarized in this blog post[1].
Bot traffic has been an issue for years. It has given rise to a new cottage industry of ad fraud detection services, none of which I’ve found particularly valuable. It always comes down to “so what do we do about it?” and no one seems to know how to get bots to stop viewing or clicking on paid media placements. Consumers use Google search and are on FB, IG, TikTok, LinkedIn, etc., and there aren’t really competing ad networks with “fewer bots”. So they keep buying fake traffic knowing that a significant chunk of it is invalid.
I don’t see anything changing unless big tech companies making billions in ad revenue have a large enough incentive to do so. At the moment they have plenty of incentives to keep things as they are.
“Half of the money I spend on advertising is wasted; the trouble is I don’t know which half.”
- John Wanamaker
I'd suggest legislation. It's like when hotels or rental car companies tack on additional fees and consumers hate it but don't have much incentive to cooperate to change it. Consumers don't unionize much, but maybe we should?
But maybe not legislation, because it might upset a ton of people. Any time a social media site goes through and seems to get rid of bots, people complain about how many "followers" they've lost. The fake user/clicker situation is quite pervasive, and a lot of parties, not just the advertising networks, benefit from the inflated numbers.
But it can be so insidious..."wow, that video got 1M YouTube views? Must be very popular!" Orrrr a lot of those views were from bots? Who knows?
So maybe a better approach than legislation is to talk about how the ad/bot fraud can both help us and hurt us and not demonize one side but see how we all may be implicated in it somehow. Maybe that will help us to be more aware of the problem and not fight against people, but try to work together to solve it.
Given that Wanamaker died in 1922 it’s safe to say his quote was in the context of a different problem entirely (which still exists, on top of the bot issue). Maybe it’s time to update to “three fourths”?
> It seems like a massive, open secret that nobody wants to talk about because the whole system is propped up by it.
I was never really a punk-rebel kid, but a certain part of me rooted in the optimism of the early-internet kinda wants to see ad-models crash and burn.
Even advertising "working normally" always had a psychic odor of exploitation and deceit. Ex: "You absolutely need this product or else your peers will hate you."
P.S.: Imagine a world with "AI" (open-source, user-controlled) which can instantly detect ads on your screen, and pass the bounding-boxes to another program which covers them up.
You could even install them on some digital glasses for a kind of They Live sunglasses situation [0], but please never wear such things while driving...
I work in e-commerce and completely believe this could be true. The 'Cart Abandonment Bot' is actually something we just had a meeting on today, trying to figure out what's going on here.
With that said, what do you think the error is here? Could your script have had some false positives? Enough to move the dial?
Some sites won't give you the price or the shipping until you add to cart, but that shouldn't map to an automatic removal of item from cart (unless there IS a timer somewhere set to 4 minutes on the server side).
Strange that the item cost $47 and the number of sales was 47.
Ah... I remember back when Anandtech would run a monthly roundup of hard drives that would usually insist that you pay another $150-$200 for a hard drive because hypothetically the more esoteric consumer SKUs were 3db softer or consumed 0.2W less than the mass-produced least-cost enterprise Seagates. [1] They unquestioningly quoted the spec sheets and never did any tests whereas everyone knows you can't trust spec sheets when it comes to noise and power.
The thing was that site kept reflowing the layout over and over again and I think the point was that they were hoping you were going to click on something you saw on the sidebar and then it would reflow and an ad would be there right at the second when you clicked and then... Ka-Ching!
[1] Funny reversal that the enterprise product is mass market and the consumer product is overpriced if not gold-plated.
And if you've ever scrolled social media or used the internet at all, you've read and clicked ads.
Only considering old school "Ads by google" banners as ads and then patting yourself on the back for never clicking them is pretty faint praise for your ad evasion skills.
No, of course I've read longer form posts and then realized they were just there for product placement, or affilate links, etc.
I don't click display ads or links to Amazon or YouTube listings, etc. If I want to buy something I try to go directly to the manufacture's site and search for it there. If I have to go to a third party sales platform I'll go there and search for the thing I want.
I rarely click on YouTube recommedations, because they are more and more just AI slop. I subscribe to people whose content is interesting, and that's the vast majority of what I watch.
Yeah. Even in the extremely rare circumstance where if I actually want to see more about the product I specifically search it instead of clicking on it because everyone knows clicking on the ad is how you get infected, spied, on or scammed. And I think they already discarded 'views' long ago as too useless ironically.
> I had one client spending $12,000 per month on Google Ads
In Google Ads you can just turn off the option to run your ads on non-Google sites; I think it's called their Display Network. Just run your campaign only on Google's search pages.
I'm surprised the article doesn't mention this rather common solution.
We've created open-source security analytics [1] that some organizations use for click fraud detection.
The numbers that Google Ads shows as invalid clicks is dramatically different from what our platform detects.
In our cases, most fraudulent clicks come from mobile devices and mobile networks. Also, bots don't load page resources in full, making them easier to detect.
If your organization is struggling with fraud click detection, don't hesitate to reach out by email. We are ready to help.
We are now at the point where plausibly some bots could be at the behest of an AI agent controlled by a human.
This is not likely to distort the basic numbers in your story, but it makes the premise questionable
If your script can correctly segregate bot traffic from human traffic, and website operators can conclude that bot traffic does not make purchases, then what -- you still don't want to blackhole that traffic right ?
I'm none of those things, 10x or otherwise, and would be ashamed if I were. I do run a personal website, though, and most of the traffic is bots that ignore robots.txt. I wish those developers, founders, and marketers would stop doing that to me.
What do you think is the false negative ratio of the remaining 27%? A 0.1% total conversion rate would imply 1/270 visitors buying stuff and I am quite certain I buy stuff in maybe every 10th online store I visit or something.
But what I'm struggling to understand is where is the money coming from to run those bots?
How does the advertising money get to them to make it worth their while to run bots at such a scale?
I mean sure, I suspect that people in the advertisers are doing it, but surely thats massively risky. there must be a grey market for this kind of transaction?
> But what I'm struggling to understand is where is the money coming from to run those bots?
I agree, I have a hard time understanding the motivation for this behavior. It's obviously happening, I just don't know how someone benefits from a bot pretending to browse the internet.
It would but the fact it goes out and does it to other sites is weird. Why would someone else fraudulently click my ads? As an attack, I understand that, I've had that done to me. Just randomly seems odd. Maybe I'm just not creative enough to think of bad things to do with this. :D
for ad networks and social media platforms that provide monetization the click fraud is direct.
there is also a massive industry of fake accounts and fake engagement for social media and SEO (google). bots are designed to create plausibly real engagement, which is used to trick ranking algorithms into boosting content. these bots have to be real enough to bypass platform detection. clicking through on ads is a way of incentivizing platforms not to shut them down and possibly improving the ranking results, working with the theory that platforms give stronger weight to engagement signals from clients that generate more revenue.
Too often I have been working on ecommerce gigs where the traffic comes from internal 'SEO' tools. Normally the person in charge is marketing, not technical, so it has always been difficult to get past identifying the problem to fixing it.
Often ecommerce companies are very siloed, so one person in one part of the 'team' is scraping the site with their special tools, only for another person to be doing another scrape with their special tools. You can have the guy doing the newsletter doing his thing, the guy doing organic search doing their thing, the guy doing paid ads doing their thing and someone in sales doing their own thing.
The sad thing is that they are typically just a few SQL joins away from exactly the data they want in a format they can digest. However, due to silo-ing, it can be hard to have that conversation.
On top of that, you do get new bots that need to be dealt with. The Huawei bot will scrape everything yet the store might not be delivering to China. So there are legit bots not doing ad fraud that need to be dealt with.
What I also find interesting is that nobody is interested in the server logs. They come for free, and, although not having CDN cache hits, they still record checkout transactions and any pages that don't have a freshly cached page.
Ad fraud also goes on in companies. I worked for a very successful company once and we only measured sales and what was out of stock. We didn't need open rates for emails or click through rates, our main problem was selling too much, which was a nice problem to have. Note that if you sell too much then you aren't going to get it all out the door in a timely fashion, or you run out of big lorries to put the orders in.
Since then I have not worked on a site that is as successful. Instead we have people getting praise for 'false metrics'. Anything an SEO person creates or a marketeer measures will always have some nonsense aspect to it. Or the accounting is mixed with brick and mortar sales even though free shipping and discounts have been given on each sale, with adwords used to get people through the 'door'.
Since sales manager has to report to someone on the board, if the sales numbers aren't good, the nonsense stats can be used to obfuscate the facts. The board only ever care about profit, so I don't like the way this goes down with false metrics of nonsense.
Another fundamental problem in ecommerce is a lack of basic salesmanship. If you work the shop floor doing specialist sales where you have to listen to the customer's needs, then you gain experience in the art of sales. You haven't got to be good at it, in fact it can be better to know your limitations, for me that is big ticket items where I don't have the product knowledge, however, I could always hand those sales over to a much more capable colleague.
You don't win every sale, but, in retail, you can have some really good streaks where no customer leaves empty handed. Your conversion rate is going to be more like 90% in face to face sales if you have the right product at the right price, with customers that don't buy coming back the following day or week to splash the cash.
In High Street retail there is no way you give customers 15%+ off just for stepping through the door. Yet this is table stakes in ecommerce, particularly for small to medium size shops. This instantly devalues the product.
Often there will be chatbots for whatever reason, and I am sure the likes of Dell can get that right, but your typical small ecommerce site will fluff this up too, so any customer daring to use the chatbot will not get instant help from a sales person.
So what to do?
It depends on your product, however, the goal is to get customers for life, not to churn through them. To achieve this it comes down to product, price, availability, shipping times, customer service and incentives for the customer to advertise for you, with reviews, word of mouth and all that hard stuff that needs real human skill. Sometimes it will only be a one-off sale, for example, if someone is buying a mattress. But, even then, the customer service basics matter.
What is also silly is how, with everything you buy, you will get adverts and incentives to buy what you have just purchased. I don't know why this is not considered career limiting, but nobody seems to have fixed this.
Next time, I will find out what data everyone needs and have my own script to collect just that data, then have a live backup that can be used for all internal purposes such as report generation. There will be no mystery script containers downloading 150 scripts with cookies, outside of developer supervision. My hunch is that the only numbers that really matter are sales.
As for why the sales aren't happening, so long as functionality is as it should be on all devices, then you have to dig deeper, to the knowledge level that only someone that has worked the showroom floor understands. Really, a website should be the sales person's knowledge condensed into HTML. Yet nobody asks the guy serving customers what they upsell or what product they recommend if a customer baulks at a given product. Instead we have mystery-meat AI scripts that manage these things.
Thanks for the heads up on ad fraud, normally I am a long way from that due to internal scraping efforts being far more damaging to website performance, and, as a developer, page load times matter to me far more than how much was paid for the traffic.
Okay, well I'm asking the author why they developed their own analytics tool rather than using an industry standard hoping there's some reason beyond motivational phrases.
Running a startup is brutal, right? You're juggling a million things – product design, bugs, marketing, sales, customers – and probably burning through cash and energy like crazy. It’s a lot.
And then, on top of ALL that, you need to find investors. Going website by website, LinkedIn profile to profile… what a time drain when you’re already swamped. It’s a massive pain.
I built my own startup and got sick of this, so I had my team create a big database of VCs and Angel investors. It saved us a ton of headaches.
Figured it could help others too. So, I'm giving away a chunk of it: 450+ VC and Angel investor contacts to get you started. No catch, just hoping it helps you get those first investor chats going.
The FaaS angle is what gets me. They weren't just running a SIMbox farm in a basement they had public websites, API documentation, and were essentially selling "bypass SMS verification as a service" to other criminals globally. That's a business model. That's engineering.
The scale is where it gets sobering. 1,200 SIMbox devices, 40,000 simultaneous SIM cards, 50 million fake accounts. That's not amateur hour. That's logistics, infrastructure, customer support. They'd solved the hard problems: How do you manage hardware at scale? How do you keep 40,000 SIMs operational? How do you make it easy enough that non-technical actors can integrate it into their fraud workflows?
And yeah, the systemic stuff is the real problem. We're all operating under the assumption that "phone verified" means something. It doesn't anymore, apparently. All those metrics everyone relies on user growth, engagement, review scores there's just... noise in there. A lot of noise.
Makes you wonder: for every verification layer we add, is there already a service like this being built to defeat it?