More

mkolodny · on Dec 5, 2024

PRELIMINARY EARTHQUAKE PARAMETERS ---------------------------------

* The following parameters are based on a rapid preliminary assessment of the earthquake and changes may occur.

* Magnitude 7.3 * Origin Time 0944 AKST Dec 05 2024 1044 PST Dec 05 2024 1844 UTC Dec 05 2024 * Coordinates 40.3 North 124.7 West * Depth 8 miles * Location 45 miles SW of Eureka, California 215 miles NW of San Francisco, California

mkolodny · on Nov 21, 2024

On VirusTotal, 5 different vendors flag Z-Library as malicious. Are they just flagging the site because of IP issues, or is the site full of malware?

ajvs · on Nov 21, 2024

If you're going to the official domains linked from their Wikipedia article then there's definitely no malware.

mkolodny · on Nov 22, 2024

One vendor on VirusTotal flagged the domain currently linked from their Wikipedia article as malicious.

mkolodny · on Nov 7, 2024

The Internet was largely developed by the U.S. Defense Advanced Research Projects Agency (DARPA) [0]. Tools like the Internet and AI can be used for many things, good and bad.

[0] https://www.internetsociety.org/internet/history-internet/br...

mkolodny · on Aug 8, 2024

I'm surprised people are still making that argument even after the pandemic showed us the risk is nowhere near worth the reward. Regardless you need to create a vaccine for the discovered virus (which can take less than a week, as was the case with covid). And then you still need to go through months of human trials.

I was hoping we were done risking starting pandemics by purposefully creating new deadly viruses.

nemo44x · on Aug 8, 2024

Agreed. Making super viruses to show what could possibly happen, however unlikely, is the epitome of hubris. But boy is it a great to get funding.

I do wonder what the calculus is when comparing the chance nature could mutate and successfully introduce itself to the human population vs the chance of it escaping a lab after being created by humans to study gain of function, etc.

jjeaff · on Aug 8, 2024

the jury is still out on whether COVID originated from a lab. it seems very possible, but there is still little evidence that proves it was created in a lab.

andai · on Aug 8, 2024

Well, we do know COVID did leak from a lab in China at least one time: in 2021, a researcher in Taiwan was bitten by an infected mouse and contracted the disease.

Edit: My bad, as far as public knowledge goes, coronavirus leaked three times in China (SARS coronavirus 2x, COVID 1x), and once in Singapore.

https://en.wikipedia.org/wiki/List_of_laboratory_biosecurity...

As for Wuhan in November 2019, the Chinese government took several actions at that time which you would expect to be taken in response to a biosecurity incident: visits from biosecurity officials, remedial biosecurity training, and (coincidentally) government simultaneously began work on a COVID vaccine.

Only circumstantial evidence, though... so... ¯\_(ツ)_/¯

Source: a study by the US senate, covered here by WSJ: https://archive.ph/Kh2Fr

HaZeust · on Aug 8, 2024

Can you please cite a source that shares that it took "less than a week" for the COVID vaccine to be developed after discovery?

mkolodny · on Aug 8, 2024

“You may be surprised to learn that of the trio of long-awaited coronavirus vaccines, the most promising, Moderna’s mRNA-1273, which reported a 94.5 percent efficacy rate on November 16, had been designed by January 13. This was just two days after the genetic sequence had been made public”

https://nymag.com/intelligencer/2020/12/moderna-covid-19-vac...

palata · on Aug 8, 2024

Isn't there a big difference between "designed" and "developed"? For instance the whole testing phase?

Which doesn't mean it is not impressively fast, but still it's not done in a week. Plus testing the covid vaccines was quick because there were many many people to participate in the tests.

zelphirkalt · on Aug 8, 2024

Is this assuming, that Covid was lab made? Since I have not seen or read about proof of that theory, this comes across a bit like conspiracy theory.

mkolodny · on July 23, 2024

Llama’s code is open source: https://github.com/meta-llama/llama3/blob/main/llama/model.p...

Flimm · on July 23, 2024

No, it's not. The Llama 3 Community License Agreement is not an open source license. Open source licenses need to meet the criteria of the only widely accepted definition of "open source", and that's the one formulated by the OSI [0]. This license has multiple restrictions on use and distribution which make it not open source. I know Facebook keeps calling this stuff open source, maybe in order to get all the good will that open source branding gets you, but that doesn't make it true. It's like a company calling their candy vegan while listing one its ingredients as pork-based gelatin. No matter how many times the company advertises that their product is vegan, it's not, because it doesn't meet the definition of vegan.

[0] - https://opensource.org/osd

8note · on July 23, 2024

Isn't the MIT license the generally accepted "open source" license? It's a community owned term, not OSI owned

yjftsjthsd-h · on July 23, 2024

MIT is a permissive open source license, not the open source license.

henryfjordan · on July 23, 2024

There are more licenses than just MIT that are "open source". GPL, BSD, MIT, Apache, some of the Creative Commons licenses, etc. MIT has become the defacto default though

https://opensource.org/license (linking to OSI for the list because it's convenient, not because they get to decide)

NiloCK · on July 24, 2024

These discussions (ie, everything that follows here) would be much easier if the crowd insisting on the OSI definition of open source would capitalize Open Source.

In English, proper nouns are capitalized.

"Open" and "source" are both very normal English words. English speakers have "the right" to use them according to their own perspective and with personal context. It's the difference between referring to a blue tooth, and Bluetooth, or to an apple store or an Apple store.

CamperBob2 · on July 23, 2024

Open source licenses need to meet the criteria of the only widely accepted definition of "open source", and that's the one formulated by the OSI [0]

Who died and made OSI God?

MaxBarraclough · on July 23, 2024

This isn't helpful. The community defers to the OSI's definition because it captures what they care about.

We've seen people try to deceptively describe non-OSS projects as open source, and no doubt we will continue to see it. Thankfully the community (including Hacker News) is quick to call it out, and to insist on not cheapening the term.

This is one the topics that just keeps turning up:

* https://news.ycombinator.com/item?id=24483168

* https://news.ycombinator.com/item?id=31203209

* https://news.ycombinator.com/item?id=36591820

CamperBob2 · on July 23, 2024

This isn't helpful. The community...

Speak for yourself, please. The term is much older than 1998, with one easily-Googled example being https://www.cia.gov/readingroom/docs/DOC_0000639879.pdf , and an explicit case of IT-related usage being https://i.imgur.com/Nw4is6s.png from https://www.google.com/books/edition/InfoWarCon/09X3Ove9uKgC... .

Unless a registered trademark is involved (spoiler: it's not) no one, whether part of a so-called "community" or not, has any authority to gatekeep or dictate the terms under which a generic phrase like "open source" can be used.

Flimm · on July 24, 2024

Neither of those usages relate to IT, they both are about sources of intelligence (espionage). Even if they were, the OSI definition won, nobody is using the definitions from 1995 CIA or the 1996 InfoConWar book in the realm of IT, not even Facebook.

The community has the authority to complain about companies mis-labelling their pork products as vegan, even if nobody has a registered trademark on the term vegan. Would you tell people to shut up about that case because they don't have a registered trademark? Likewise, the community has authority to complain about Meta/Facebook mis-labelling code as open source even when they put restrictions on usage. It's not gate-keeping or dictatorship to complain about being misled or being lied to.

CamperBob2 · on July 24, 2024

Would you tell people to shut up about that case because they don't have a registered trademark?

I especially like how I'm the one telling people to "shut up" all of a sudden.

As for the rest, see my other reply.

Flimm · on July 24, 2024

You're right, I and those who agree with me were the first to ask people to "shut up", in this case, to ask Meta to stop misusing the term open source. And I was the first to say "shut up", and I know that can be inflammatory and disrespectful, so I shouldn't have used it. I'm sorry. We're here in a discussion forum, I want you to express your opinion even it is to complain about my complaints. For what it's worth, your counter-arguments have been stronger and better referenced than any other I have read (for the case of accepting a looser definition of the term open source in the realm of IT).

CamperBob2 · on July 24, 2024

All good, and I also apologize if my objection came across as disrespectful.

This whole 'Open Source' thing is a bigger pet peeve than it should be, because I've received criticism for using the term on a page where I literally just posted a .zip file full of source code. The smart thing to do would have been to ignore and forget the criticism, which I will now work harder at doing.

In the case of a pork producer who labels their products as 'vegan', that's different because there is some authority behind the usage of 'vegan'. It's a standard English-language word that according to Merriam-Webster goes back to 1944. So that would amount to an open-and-shut case of false advertising, which I don't think applies here at all.

MaxBarraclough · on July 24, 2024

> In the case of a pork producer who labels their products as 'vegan', that's different because there is some authority behind the usage of 'vegan'.

I don't see the difference. Open source software is a term of art with a specific meaning accepted by its community. When people misuse the term, invariably in such a way as to broaden it to include whatever it is they're pushing, it's right that the community responds harshly.

CamperBob2 · on July 24, 2024

Terms of art do not require licenses. A given term is either an ordinary dictionary word that everyone including the courts will readily recognize ("Vegan"), a trademark ("Microsoft® Oﬃce 365™"), or a fragment of language that everyone can feel free to use for their own purposes without asking permission. "Open Source" falls into the latter category.

This kind of argument is literally why trademark law exists. OSI did not elect to go down that path. Maybe they should have, but I respect their decision not to, and perhaps you should, too.

MaxBarraclough · on July 25, 2024

> Terms of art do not require licenses.

Agreed. There is no trademark on aileron or carburetor or context-free grammar. A couple of years ago I made this same point myself. [0]

> A given term is either an ordinary dictionary word that everyone including the courts will readily recognize ("Vegan"), a trademark ("Microsoft® Office 365™"), or a fragment of language that everyone can feel free to use for their own purposes without asking permission. "Open Source" falls into the latter category.

This taxonomy doesn't hold up.

Again, it's a term of art with a clear meaning accepted by its community. We've seen numerous instances of cynical and deceptive misuse of the term, which the community rightly calls out because it's not fair play, it's deliberate deception.

> This kind of argument is literally why trademark law exists

It is not. Trademark law exists to protect brands, not to clarify terminology.

You seem to be contradicting your earlier point that terms of art do not require licenses.

> OSI did not elect to go down that path. Maybe they should have, but I respect their decision not to, and perhaps you should, too.

I haven't expressed any opinion on that topic, and I don't see a need to.

[0] https://news.ycombinator.com/item?id=31203209

CamperBob2 · on July 25, 2024

If the OSI members wanted to "clarify the terminology" in a way that permitted them (and you) to exclude others, trademark law would have absolutely been the correct way to do that. It's too late, however. The ship has sailed.

Come up with a new term and trademark that, and heck, I'll help you out with a legal fund donation when Facebook and friends inevitably try to appropriate it. Apart from that, you've fought the good fight and done what you could. Let it go.

vbarrielle · on July 23, 2024

The OSI was created about 20 years ago and defined and popularized the term open source. Their definition has been widely accepted over that period.

Recently, companies are trying to market things as open source when in reality, they fail to adhere to the definition.

I think we should not let these companies change the meaning of the term, which means it's important to explain every time they try to seem more open than they are.

I'm afraid the battle is being lost though.

Suppafly · on July 23, 2024

>The OSI was created about 20 years ago and defined and popularized the term open source. Their definition has been widely accepted over that period.

It was defined and accepted by the community well before OSI came around though.

gowld · on July 23, 2024

Citation? Wikipedia would appreciate your contribution.

https://en.wikipedia.org/wiki/Open_source

> Linus Torvalds, Larry Wall, Brian Behlendorf, Eric Allman, Guido van Rossum, Michael Tiemann, Paul Vixie, Jamie Zawinski, and Eric Raymond [...] > At that meeting, alternatives to the term "free software" were discussed. [...] Raymond argued for "open source. The assembled developers took a vote, and the winner was announced at a press conference the same evening

The original "Open source Definition" was derived from Debian's Social Contract, which did not use the term "open source"

https://web.archive.org/web/20140328095107/http://www.debian...

CamperBob2 · on July 24, 2024

Citation? Wikipedia would appreciate your contribution.

It's not hard to find earlier examples where the phrase is used to describe enabling and (yes) leveraging community contributions to accomplish things that otherwise wouldn't be practical; see my other post for a couple of those.

But then people will rightfully object that the term "Open Source", when used in a capacity related to journalistic or intelligence-gathering activities, doesn't have anything to do with software licensing. Even if OSI had trademarked the phrase, which they didn't, that shouldn't constrain its use in another context.

To which I'd counter that this statement is equally true when discussing AI models. We are going to have to completely rewire copyright law from the ground up to deal with this. Flame wars over what "Open Source" means or who has the right to use the phrase are going to look completely inconsequential by the time the dust settles.

Flimm · on July 24, 2024

I'll concede that "open source" may mean other things in other contexts. For example, an open source river may mean something in particular to those who study rivers. This thread was not talking about a new context, it was not even talking about the weights of a machine learning model or the licensing of training data, it was talking about the licensing of the code in a particular GitHub repository, llama3.

AI may make copyright obsolete, or it may make copyright more important than ever, but my prediction is that the IT community will lose something of great value if the term "open source" is diluted to include licenses that restrict usage, restrict distribution, and restrict modification. I can understand why people may want to choose somewhat restrictive licenses, just like I can understand why a product may contain gelatin, but I don't like it when the product is mis-labelled as vegan. There are plenty of other terms that could be used, for example, "open" by itself. I'm honestly curious if you would defend a pork product labelled as vegan, or do you just feel that the analogy doesn't apply?

mesebrec · on July 23, 2024

This is like saying any python program is open source because the python runtime is open source.

Inference code is the runtime; the code that runs the model. Not the model itself.

mkolodny · on July 23, 2024

I disagree. The file I linked to, model.py, contains the Llama 3 model itself.

You can use that model with open data to train it from scratch yourself. Or you can load Meta’s open weights and have a working LLM.

causal · on July 23, 2024

Yeah a lot of people here seem to not understand that PyTorch really does make model definitions that simple, and that has everything you need to resume back-propagation. Not to mention PyTorch itself being open-sourced by Meta.

That said the LLama-license doesn't meet strict definitions of OS, and I bet they have internal tooling for datacenter-scale training that's not represented here.

yjftsjthsd-h · on July 23, 2024

> The file I linked to, model.py, contains the Llama 3 model itself.

That makes it source available ( https://en.wikipedia.org/wiki/Source-available_software ), not open source

macrolime · on July 23, 2024

Source available means you can see the source, but not modify it. This is kinda the opposite, you can modify the model, but you don't see all the details of its creation.

yjftsjthsd-h · on July 23, 2024

> Source available means you can see the source, but not modify it.

No, it doesn't mean that. To quote the page I linked, emphasis mine,

> Source-available software is software released through a source code distribution model that includes arrangements where the source can be viewed, and in some cases modified, but without necessarily meeting the criteria to be called open-source. The licenses associated with the offerings range from allowing code to be viewed for reference to allowing code to be modified and redistributed for both commercial and non-commercial purposes.

> This is kinda the opposite, you can modify the model, but you don't see all the details of its creation.

Per https://github.com/meta-llama/llama3/blob/main/LICENSE there's also a laundry list of ways you're not allowed to use it, including restrictions on commercial use. So not Open Source.

apsec112 · on July 23, 2024

That's not the training code, just the inference code. The training code, running on thousands of high-end H100 servers, is surely much more complex. They also don't open-source the dataset, or the code they used for data scraping/filtering/etc.

the8thbit · on July 23, 2024

"just the inference code"

It's not the "inference code", its the code that specifies the architecture of the model and loads the model. The "inference code" is mostly the model, and the model is not legible to a human reader.

Maybe someday open source models will be possible, but we will need much better interpretability tools so we can generate the source code from the model. In most software projects you write the source as a specification that is then used by the computer to implement the software, but in this case the process is reversed.

blackeyeblitzar · on July 23, 2024

That is just the inference code. Not training code or evaluation code or whatever pre/post processing they do.

patrickaljord · on July 23, 2024

Is there an LLM with actual open source training code and dataset? Besides BLOOM https://huggingface.co/bigscience/bloom

navinsylvester · on July 23, 2024

Here you go - https://github.com/apple/corenet

osanseviero · on July 23, 2024

Yes, there are a few dozen full open source models (license, code, data, models)

blackeyeblitzar · on July 23, 2024

What are some of the other ones? I am aware mainly of OLMo (https://blog.allenai.org/olmo-open-language-model-87ccfc95f5...)

mkolodny · on June 15, 2024

Given that blind people can learn to speak, audio alone must be enough to learn language. And given that deaf people can learn sign language, video alone must also be enough to learn language. That’s assuming that touch and emotion aren’t crucial to language learning.

qup · on June 15, 2024

Given Helen Keller's grasp of language, touch alone must be enough to learn language.

Hugsun · on June 15, 2024

I've often wondered if there aren't some structures in the brain, that have been selected for since the advent of language, that are good at learning up languages.

dmd · on June 15, 2024

This is one of the most discussed and argued ideas in all of linguistics and philosophy. https://plato.stanford.edu/entries/innateness-language/

Hugsun · on June 16, 2024

Indeed. I'm reminded of the time a childhood friend essentially discovered the inverted spectrum argument[1]. That is, we can't know if my qualia when perceiving the color red doesn't match yours of the color blue.

We were unfortunately young and in poor company, so the idea didn't receive the appropriate attention.

[1]: https://en.wikipedia.org/wiki/Qualia#Inverted_spectrum_argum...

Shorel · on June 15, 2024

I think Carl Sagan discusses this topic in "Broca's Brain".

https://en.wikipedia.org/wiki/Broca's_Brain

fauigerzigerk · on June 15, 2024

I think what you're missing is deliberate action and feedback. We don't just listen or watch as if the world was a movie. We act and communicate intentionally and with purpose. The response to our deliberate actions is in my view what we mostly learn from.

Blind people surely compensate for the lack of visual information by deliberately eliciting audio and tactile feedback that others don't need.

Also, watching others interact with the world is never the same thing as interacting with the world ourselves, because there's a crucial piece of information missing.

When we decide to act, we know that we just made that decision. We know when we made the decision, why we made it and what we wanted to achieve. We can never know that for sure when we observe others.

We can guess of course, but a lot of that guessing is only possible because we know how we would act and react ourselves. A machine that has never intervened and deliberately elicited feedback cannot even guess properly. It will need incredible amounts of data to learn from correlations alone.

TeMPOraL · on June 15, 2024

Different kinds of experiences, but highly correlated.

chumanak · on June 15, 2024

Emotion it’s crucial for sure

mkolodny · on May 17, 2024

That's just the default. You can set max_seq_len to 8k. From the readme [0]:

> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

[0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...

mkolodny · on April 5, 2023

If it's true, condolences and love to his family.

mkolodny · on April 3, 2023

When a problem can be solved mindlessly - with a repeatable set of steps no matter the situation - imperative/declarative programming makes sense.

Most real world situations are unique and require unique solutions. That's where AI really shines. You just describe your target, attempt to solve the problem, and pay attention to how far off you were from your target. The learning happens naturally.

Neural networks are too complex - sometimes billions of variables - to decide what each neuron should do. We as a species have evolved to develop brains that are extremely adaptable. AI mimics our own natural learning process. And it's proven to be far more effective at solving unique problems.

amarshall · on April 3, 2023

> [AI] has proven to be far more effective at solving unique problems

An extraordinary claim, I think. Source or evidence?

mkolodny · on April 3, 2023

https://chat.openai.com/

lostcolony · on April 3, 2023

When you've built a product using only OpenAI, and not a programming language, come back and tell us.

While at it, come back and tell us when you've implemented a neural net using AI, and not a programming language, come back and tell us too.

mkolodny · on April 3, 2023

https://github.com/features/copilot

lostcolony · on April 3, 2023

You don't think code is involved in that my dude? The input wasn't a description into an AI model, with copilot the model. It's code that calls to an ML model. Which is my point. The model was created with code. The model is deployed with code. It runs on infrastructure that executes a bunch of other code. Code takes in user input, puts it into the model, takes the resulting response, does something with it. There is no ML model that is a product in and of itself. ML is not a compiler. It is not a runtime environment. It does not understand business needs, it does not take direction. It certainly may be transformative, for good or ill, but it hasn't suddenly deprecated the need to code; far from it.

astrange · on April 3, 2023

Neural networks explicitly don't work the same way as human learning; they don't have online learning, humans definitely don't learn through backprop, humans have memory and compute in different parts of the brain, etc.

Also, training a neural network can make it worse; it's actually the combination system of the model and its engineers that makes it improve. (https://en.wikipedia.org/wiki/Catastrophic_interference)

mkolodny · on April 2, 2023

When a remote, international company gets very large, it can be just about impossible to find a time where everyone can meet.

Even in time-sensitive situations, I prefer asynchronous communication methods like email and voice memos. That way everyone can catch up as soon as they get a chance, no matter what time-zone they're in.