Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Social networks are getting stingy with their data (techcrunch.com)
103 points by hubraumhugo on Feb 11, 2024 | hide | past | favorite | 95 comments


Data autonomy. How do we get across to people that there's real value in owning your data - controlling it, hosting it, not just being someone else's product.

Why should we not take the broadest possible view? You own your likes, your comments, your amazon order history, your dental xrays, your histology reports, everything. One way to incentivize data consumers and processors would be to make them liable for mishandling, make the data so radioactive that they don't even want to hold onto it.


I don't think the approach of getting consumers to care can work. Most people really just don't care about taking precautions when the negative impacts of not doing so are so diffuse and time-delayed; it's an unfortunate aspect of human nature. We usually overcome these individual failings by organizing into groups better suited for long-term planning. In this case the solution that comes to mind would be to make personal data legally onerous to hold and process for companies, to the extent that they would go out of their way to design their products and services to never touch the stuff, and if they do need it to operate then they would be incentivized to store it locally on users' devices and only synchronize it in a completely encrypted form such that they never have to deal with the legal implications of having access to it.


Yeah, leaking financial data for millions of people should ruin the company and have fallout that hits the members of the board and C-suite. Instead it's actually just an opportunity to sell an identity theft "protection" subscription.

I have no faith that people will get clued in and make it happen. Everyone is merrily lining up to use the third-party face scanner at the airports.


AI is going to push data autonomy hard. Users are going to want to subscribe to different models for different things, and plug those models into whatever they're using. Everyone is going to support it, and to make it work there needs to be data exchange. Companies that don't support it to try and keep the data walled are going to hemorrhage customers.


So... GDPR? You can download your own data, force the data controller which acquired it to delete it and they're liable for mishandling it. Companies don't usually want to hold on to EU citizens' data because GDPR makes it quite radioactive, for them and whomever they sell it to.


this isn't being done for privacy these orgs want to keep their LLM gold chests to themselves.


Legally speaking you already own your healthcare data such as dental X-rays. In many cases you can even download your data from provider and payer organizations in industry standard formats (although dentistry specifically is way behind in this area). But for most patients this data is worth zero. Legal and privacy concerns aside, there's just not much use for it. Several start-ups have tried to de-identify and agregate such data for sale to researchers but consistency and quality issues make it tough to use in real studies.


Once data gets out it's impossible to completely take back. Regulations could help corral law-abiding actors, so I agree with the idea.

Though I can also see how the incentives of social platforms encourage clamping down further and further on 3P access.


> Once data gets out it's impossible to completely take back

As a society, we understand that rights !== access. Just because you share your data with Facebook, giving them access to it, in the normal course of interacting with their servers does not grant them rights to that data.

When Netflix delivers a video to your device, society understands you can’t make a copy of that video and share it with your neighbor. That’s called “The Pirate Bay.”

Data on the internet is lacking equity: an ownership interest in property. As you generate data online, platforms accumulate equity in your data giving them control over that valuable property.

If, instead, you accumulated equity in your data the entire data broker market would be “The Pirate Bay.”


This. Honestly I'd love a government to take up the privacy mantel and protect its citizens. Sure, you lose the ability to spy on your citizens but so does every other country in the world. Seems like that's a net win.

If USPS was made to ensure that all Americans can communicate, even making it explicit in the constitution. I don't know why this wouldn't also apply to cell phones and the internet. Are they not modern evolutions? Put the code on the gov's GitHub along with the rest. Other players can exist, but it sets a baseline standard. But any country can do this, doesn't need to be the US.


Pay them to hold on to their data and manage it? That’s about the only way for end-users to experience the actual value of data.


I was about to open a Mastodon account, but because I don't really use to publish anything too often anyway, instead left it for later.

A couple month passed and the instance I had in mind, closed for good.

I've been told before in HN that Mastodon's solution is non-existent for these situations. Has the landscape changed or you're still f*d if you choose the wrong one? (aka. an incentive for centralization or always chosing only among the most popular instances)


The solution is to run your own Mastodon instance. Unfortunately, I find doing so can be quite a hassle. Obviously it's not free, and not only do you need to configure it properly, but you need to handle backups, updates and so on. Even for me, with a fair bit of technical experience, it can be challenging. I think there are significant barriers to doing so for people with no or little technical knowledge.


> The solution is to run your own Mastodon instance

The problem is that Mastodon isn't really great for single-user instances (hassle to upgrade, slurps resources for breakfast, etc) but there are plenty of other ActivityPub-compatible software that is great for single-user usage, like Pleroma, Micro.blog, Akkoma and more.


I did it with a free VM and wrote a guide here [1]. Zero administration work with docker automatic updates.

[1]: https://du.nkel.dev/blog/2023-12-12_mastodon-docker-rootless...


While following a guide can get someone set up quickly, the problem is that they may not have the background to deal with issues/breakage down the road. Maybe it's a botched update, a strange error, or so forth. The security side is also another can of worms and I recommend that people do some reading into this topic before running a public web service.

I'm not trying to dissuade you from writing this - such guides are always appreciated by the community. It's just that I've seen a lot of inexperienced people try setting these things up with a guide and getting bit in the back a couple months later.


Yes, your skepticism makes sense. I am running services since 7 years, so it is not that I did not know what I was getting in.


What service did you get your VM from?


Oracle Free Tier


Thanks. What's stopping someone from creating a .deb that does most/all of this?


I’m paying a coffee a month for masto.host to do it for me. Great little service.


I went to check on the service, thinking about the 1.50€ a coffee might cost here :) but 6€ (+ taxes!) feels too steep for the value I'd get from it. Bitwarden is my frame of reference at 1€/month, and it is the most valuable tool I pay for, so it's difficult to justify 6 times that just for a social network.

I think this underlines the impression I had that Mastodon software is quite bad in terms of server resources consumption, feels like a very heavy service to host. Probably lots of performance improvements are still achievable, and hopefully hosting it becomes cheaper over time.


You're incentivized to either choose a stable, robust instance, or self-host. A lot of instances are pet projects that people kill off once they get bored of mastodon and don't want to pay for the upkeep anymore.

Self-hosting is the only way to really guarantee that it'll still be up years later. But you might get randomly banned from certain instances that blanket-ban single-user instances.


I've got to say, that doesn't sound like a very good solution.

How are people who aren't already part of the community supposed to know what a stable, robust instance is?

And self-hosting is all very well, but if I wanted to join a community comprised exclusively of greybeard unix sysadmins, I'd use IRC :)


Isn't self-hosting more likely to end up being one of those pet projects that gets killed off?


but if it's a pet project that gets killed off it's because you yourself got bored of it and killed it, it's entirely within your control, not at the whims of someone else


It’s basically analogous to e-mail. To have an e-mail account, you have to choose a host first. Most people choose a popular host (Gmail, Outlook), but there are various reasons one could prefer a different one.


> I've been told before in HN that Mastodon's solution is non-existent for these situations.

What are you worried about losing? You can download everything if it's a concern. Honestly though, the concern you're raising is non-existent on every platform, because the others only have a single instance.


You're equally messed up if your account on a centralised service is banned, in which case you chose the wrong one. You could choose a server that's signed up to the Mastodon Covenant to provide some more peace of mind, and should always take a backup of your account from time-to-time.


I chose a mastodon instance for which the financing is quite clear. I donate once a year. Admin is reactive and all to updates and also use the instance daily.

If somehow I find out admin start not giving updates and updating the instance, or the donations are not meeting the goal I assume I would have time to startup my instance and or move somewhere else.


I really feel there is a need and future growth in distrubuted backup as a service -

A few of us should get together and make a 'backup your stuff service' that can pull from mastodon and any other service, and make 2 backups in two different places around the world.

Offer addons for storing BnW copies of pics maybe, addon's for other services.

Should login, add link to your thing, walk through authorizing whatever is needed, and getting an email or DM that backup succeeds every week or something.


Run your own blog, on your own domain with github. It takes like 2 hours to setup and costs $20/year. Lately, it really feels like the whole "social media" decade was a dead end.


what 20$? I am running it for free except for domain name cost.


I’m guessing that’s what their $20/year is for. GitHub pages lets you use your own domain once you buy it.


Can you share your setup/tech stack?


I was having this exact thought around 2015, not longer after spaCy became open source. When I first tested it out, I was blow away how well it performed in every possible task: it was light years ahead of nltk and gensim, which were the big and well established players in that space. Even back then I was certain that in the not-so distant future, data will cost a fortune: considerably more than it already did. And I won't lie, starting to harvest data online on a massive scale did cross my mind and capitalize on it when the day comes. And now I really regret not doing it. Reddit closed itself off, so did stackoverflow, twitter is a no go, facebook made it nearly impossible. Cloudflare makes traditional scraping nearly impossible, if scraping wasn't already a nightmare with the modern web stacks: the doors are nearly shut and it will only get worse.

It really hurts me to know that I expected this to happen and didn't do anything about it. Oh well... One of many missed opportunities in my life I suppose(WAY more than I'd like to admit).


Farcaster is a decentralized social network, much like Twitter but with Channels (which are similar to subreddits). All the data is open.

A cryptographic signing key is attached to each account, and they have been experiencing very fast growth over the past month.

A dashboard is possible to share because the data is open: https://dune.com/pixelhack/farcaster


What's the role of Ethereum in farcaster? I noticed it in the design Overview but I don't really have the time or motivation to get deeper.


Your identity is on Ethereum (a rollup on top of it being a bit more precise). So stuff like this https://techcrunch.com/2023/07/26/twitter-now-x-took-over-th... cannot happen.


Yeah I read a bit further and realized it's tied to Ethereum. But there seems to be an on-chain and off-chain design that I don't get.

Either way I'm staying away from anything that's tied to a crypto coin. I know Ethereum is also contracts, but I just feel that crypto is tainted with scams. If a technology wants to use a block chain, fine, but why tie it to a monetary value? That's when you start stepping over the line of tech and into people's life savings.


That feels a bit like throwing the baby with the bathwater. Ethereum provides a global, permissionless public key infrastructure. Then on top of that it has the token for payments, and the smart contract functionality to build whatever.

Farcaster simply taps onto the PKI part, to assign an identity that the user can own. That seems to be pretty legitimate use case. And you don't get any exposure to a token. In fact, you don't need to know any of those technicalities. Besides knowing that your identity and social graph (who you follow, who follows you, likes, etc...) cannot be rugpulled from you. Which seems pretty neat.


But immutable identities are possible without having it tied to money.

It's the fact that people are investing money, losing money, gaining money with this coin, that is bothering me. As soon as there is money to gain people become very manipulative.


>Minimizing bot activity: Farcaster requires that new users pay a $5 sign-up fee, aimed at preventing the creation of spam accounts, and limits users to a restricted number of “casts” tied to paid “storage units.”

https://decrypt.co/resources/farcaster-explained-the-blockch...


Does anyone really believe that an AI based on Reddit or Twitter/X data would somehow be more superior than other AI's? Or that it would somehow provide a snarky competitive advantage included with other data? I don't see it.


Doesn't matter. Execs, MBA types, and VCs only hear "data is the new oil" and think it is just as fungible.


Unfortunately data post launch of ChatGPT is now worthless as it's contaminated by the very same bots



Superior may depend on your goal. If it’s mass disinformation campaigns produced by generative AI, those mentioned data sets may be ripe for the cause.


Yup. They're exactly what you need if your want to imitate a redditor or Twitter user.

Also not useless for just learning language.


I don't use it that much, but the farcaster protocol as a backbone for a social app is super hot for devs right now. Most of the stuff is open source, they even changed some login steps so that it would be closer to web2 experience compared to usual 'web3'. Idk for how long it'll keep being this friendly for devs, but that's the state for now


This is hilarious because Sam Altman was on the board of reddit until recently. I don't believe that reddit closed off API access due to AI data scraping. They did it to force people to use their shitty app so they can get more ad impressions before the IPO.


Data is oil (including synthetic), and consumer data companies (social networks) are sensing that their data is soon going to be the only defensible IP they have. Gotta hoard every little bit of it to maximize market cap


This could have been a headline from 2011


They were always stingy with their data. It has been repeatedly challenging when they change their ToS. I will never build on top of another social network unless they are federated in nature.


Twitter used to view itself as a microblogging service and was so open that they allowed syndication by offering an rss endpoint, we are light years away from that at this point.


And the great people driven fediverse should also be mindful about who they let in as users. Because you need a user account to get an API key.

The gatekeeper is always the mod who approves new user accounts. Focus on that part and we might keep the data hungry monopolies out of the fedi too.

Many small instances is much more viable than a few huge ones that let anyone in.


I (stupidly) stored all my 4k video footage on facebook. It was great for a while. Only later did I learn that if the video viewership drops below a threshold, they delete the 4k stream without warning, leaving only the 360p stream. So I just have a bunch of blurry videos when I export all my facebook data.


Terrible for you but saves them a boatload of money at their scale.


It seems to me that what you did _inside_ the social networks used to have a lot of value but now it seems like what you do _outside_ may be even more valuable to try to model users wants (product or otherwise). If everybody starts closing their doors I wonder if there will be a breakdown of the prediction abilities?

I guess every app and webpage is willing to sell your behavior to the highest bidder so it probably doesn't matter.


Of course it isn't their data it is ours. But only worth anything in aggregate.

It was well known how much this stuff was worth before and that amount has been declining. With AI hype convincing people of unknown potential it becomes a speculative asset again.

Reddit was a financial failure and now it is trying to IPO again on the basis that your shit posts are the new NFTs. This issue will resolve itself I don't think devs are missing out.


Of all the social networks, I think Reddit is actually the one with the highest signal to noise ratio. It's invaluable to get real human opinions on product recommendations, travel recommendations, how to do something, etc.


It is infamously astroturfed to the maximum and has been for over a decade. Nowadays not just for future landfill items but also politics.

And by selling the data to LLMers its fate is sealed.


I keep seeing more and more traffic to my sites from the fediverse. I think the centralized social network model is in deep shit, at least as far as the WWW goes. The days of zero interest money and investment based only on user counts are over and traditional social networks are increasingly irrelevant. TikTok is obviously not in deep shit but that’s a separate can of worms.


I really don't see centralized social networks bring competitive with the fediverse for much longer. After all, how do you convince millions of people to use your service and pay for it while you spy on them, sell every scrap of data you can, and even then still show them invasive ads?

Meanwhile mastodon is free as in freedom. If you want to, you can buy your admin a beer. People seem to like this arrangement quite a lot. You, an individual, are giving money directly to another individual for services they're providing to you with zero obligations. It feels more like buying your plumber friend a case of beer for fixing your sink and less like throwing money into a corporate void for no discernible benefit as the price slowly and invariably creeps upward.

The really cool part though is that centralized social media has to compete with the fediverse, but the fediverse does not need to compete. It's not interested in competing. It will simply continue to exist for as long as there are users. No one cares what percentage of the global population uses it, or about infinite geometric growth forever. It's just people talking to other people. It's not an experience that you get on traditional social media anymore.


The public at large doesn't care about any of that at all. I don't think mastodon will ever grow that large due to how painful the signup process is. And that's fine! But we have to acknowledge that only a very small portion of people care about things like privacy to the point of uprooting their social media where their friends are.

AT Proto is more interesting than ActivityPub anyway, but I think BlueSky will eventually succeed because the signup is painless. You just sign up on bluesky, there's no need of discussion on which instance, etc. There's no worry about losing everything if the particular instance you're on goes down, or having to deal with migrations.


What in your opinion are the pain points in the current sign up process?


That there is any discussion on "what" instance to join. I understand this is the whole point of being a decentralized system, but your average person doesn't care about that for social media. Maybe I'm wrong and mastodon.social is perceived in that way similarly to how bsky.app is for Bluesky.

I haven't followed mastodon in some time but the account migration thing was a pain as well, which I feel that the AT Protocol addresses much better.


Mastodon.social. These enormous monolithic servers offer the worst experience and degrade the whole network. And now joinmastodon.org points you at mastodon.social first, then a list of the largest servers.

In other words, centralizing.


>I think the centralized social network model is in deep shit, at least as far as the WWW goes.

Meta just beat Q4 estimates by 10%. Just because you're seeing more traffic to your site from the fediverse doesn't mean that traditional social media is dying.


You can beat estimates for a while by ramping up prices and cutting staff and costs.

Facebook has, among other things, cut their developer support to basically zero. Bugs in the APIs sit around for years. The Groups API is just being discontinued entirely.


I imagine strategically they all think they got what they needed out of being an open platform and don't need to do it anymore. I guess we'll find out if that's true or if someone can take their bacon.


> Facebook has, among other things, cut their developer support to basically zero.

They're still doing well on PyTorch at least! Although they did drop GPU support for Windows soon after the WSL2 became usable (presumably because now users could just install the Linux version on Win)


Why should they care about APIs anymore? Those no longer contribute to profitability.


Come on, it's hard to deny that the last year has proven that the vast majority don't leave when platforms like Reddit and Twitter succumb to extreme enshittification. The eternal September is passive and docile.

Those sites are already spiritually dead in the sense that they're only tolerated and no longer enjoyed. But they've achieved too big a network effect to be replaced anytime soon.


The minority that was producing quality content for free may have left. But the content is still there, so as long the content itself is still relevant, you will have users (I've deleted my Reddit account, but still go there for some searches)


their data?


I was wondering quote often about data possesion. It is their software. It sits on their servers. They convert and maintain it. Do i own my data? Currently - of course I do not. They do with it whatever they want. They sell it to whom they want.

I have option only to not use their service.


Read any of the contracts you have to agree to in order to use one of these networks. You own your data but grant them a perpetual, royalty-free, no-exceptions license to use it however they want for ever. They get to eat the cake and have it.


Farcaster a decentralized "enough" social network, which fixes this. Easy to run your own node too


They needed devs to grow back in the day, but now the devs have outlived their usefulness.


But seriously, didn't most devs see this coming?

It's the same everywhere. "Thank you for helping us make the AppStore great, now give us 30% of your revenue."


Was easy not to see in maybe in 2008, then in, AFAIR, 2010 Facebook started tightening the screws citing security first (and, well, there have been good reasons not to give full graph access to everyone at all), but then it became obvious that they were twiddling with feeds and what they didn't like at all was inadvertently giving you tools to build your own feed but without the ads.


Obscurity in the future? Or obscurity now?

There was no alternative. The old internet, frankly, requires people to pay. They don't if they can avoid it at all. This is what killed most desktop software before, and it is what's killing most internet software.


> They don't if they can avoid it at all.

Probably because the most common model is a $5-$20/month subscription to some "premium" version of the site/app. Subscription fatigue is a monster of the industry's own making.


People were paying. Even right now, I'm paying for my own internet connection (two different providers, at that. Three when I decide to enable mobile data). What happens was that platforms decided to be free for both production and consumption, and decided to finance themselves by selling our attention. Other platforms kept requesting that one of the parties (creators or consumers) pay, and it's working fine for them.

What's killing software is enshitification and growing your expenses past your revenues. People are paying for a lot of stuff and understand that products are not free to build.


Devs? Or other leeching companies that might cut into their revenue?


I used to make and run Twitter bots. I wasn't a leeching company that might cut into their revenue.


You're collateral damage


About a month after closing an account on one of these social networks one realises that nothing of value was lost.

Furthermore, „their data”.


Farcaster solves this. I'm surprised it's not mentioned in the article or anywhere in the comments.


After what they did to Facebook because of what Cambridge Analytica did using API access, this makes all the sense in the world. API access, even if it's read-only, is a huge liability.


Yeah the media loves to have it both ways because at the end of the day, they'recompetitors in the same attention business.


Use nostr


if only there was a way to guarantee data availability through some kind of system

like redundant hubs

check out farcaster.xyz or download the client on warpcast.xzy


"Megaphones are trying to keep the shouts for themselves"

This is such a silly concept. Users and 'influencers' use social media as megaphones, and they ll easily give their data to e.g. openAI if they ask for it. Social media have no moat there




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: