Amazon spends another $2.7B on Anthropic

mark_l_watson · 2024-03-27T17:08:30

I am glad to see Anthropic get more financial support, or as another person here said, AWS credits. Hopefully not so far topic as to be uninteresting: I am very happy with the state of LLM tech. There are open weight models that are very capable on modest home computers (like mistral-7B-instruct-ver-2), and beefed up home computers (I can barely run mixtral-8-7B, but it runs well enough). And there are many open weight specialty models that I find useful.

For commercial offerings, OpenAI, Anthropic, Mistral, etc. have affordable APIs, and there are a bunch of companies like Scaleway for running open weight models too large or inconvenient to run at home.

At the application layer, I like the new emphasis on agents, open source apps that use local models, and highly refined products like Perplexity, OpenAI app/web, Google’s workplace integrations, and all the stuff that Microsoft is doing.

There are problems like the environmental costs of training and running models, but things will get more efficient and people will realize the utility of smaller special purpose models.

brcmthrowaway · 2024-03-27T17:30:23

Whats the best model that can work on a Mac Studio?

woadwarrior01 · 2024-03-27T19:42:45

You can run 4-bit quantized Mixtral 8x7B with unquantized MoE gates, if you've got at least 32GB of RAM. The model itself takes up about 24GB of RAM.

yawnxyz · 2024-04-05T20:23:19

I have a 32GB M3 and... Mixtral 8x7B likes to completely crash the machine haha

woadwarrior01 · 2024-04-08T13:39:57

What software are you using for inference? I hate plugging my own app[1] here, but I know many people on my app's discord that are running 4-bit OmniQuant quantized Mixtral 8x7B on it on >= 32GB M1, M2 and M3 Macs. I run it all the time 64GB M2 Mac Studio and it takes up just under 24GB of RAM.

Also runs Yi-34B-Chat, which takes up ~18.15GB of RAM.

[1]: https://privatellm.app

LeoPanthera · 2024-03-27T19:06:48

RAM is the limiting factor, but Mixtral 8x7 is probably the state of the art for self-hosted LLMs right now. If you are low on RAM you can run a quantized version, at the expense of reduced quality.

resource_waste · 2024-03-27T19:21:11

IMO Berkley Sterling which is a fine-tuned llama.

I'm not really sure what you are expecting to do with CPU. You might be able to get some <400 token responses and have fun, but you aren't going to be doing 2000 token encyclopedia style responses unless you are going to wait ~20 minutes for a response.

Alternatively, you can get like a $800 gaming laptop that has 8gb vram or use something like vastai where you can get 12gb vram for $0.10 an hour.

dartos · 2024-03-27T19:31:17

Newer Apple computers have unified gpu and cpu memory, so they are BLAZINGLY fast with local models.

resource_waste · 2024-03-27T19:54:45

This is just Apple marketing, integrated GPUs into CPUs are still CPUs.

There is a reason no one ever shows >2000 token reponses.

acchow · 2024-03-27T22:40:16

It isn't that bad. One user with an M2 studio:

> For a 34b q8 sending in 6000 context (out of a total of 16384) I get about 4 tokens per second.

https://www.reddit.com/r/LocalLLaMA/comments/18b1qgy/comment...

You can't even fit this on 2x RTX 4090.

resource_waste · 2024-03-28T14:05:18

But why?

Like, use the 7B berkley sterling on a freaking 3060 laptop and you are still getting better results on both output and tokens.

I don't understand what they are getting out of these large models that perform worse.

acchow · 2024-03-27T21:25:53

What is the use case of buying a $800 gaming laptop to run a smaller model instead of using GPT 3.5?

thoughtpeddler · 2024-03-28T01:30:03

One word: privacy

mark_l_watson · 2024-03-27T22:48:20

I have a M2 Pro 32B memory Mac mini, and I can run mixtral-7-8-Q3 - that is with 3 bit quantization.

woadwarrior01 · 2024-04-08T13:47:13

IIRC, you're mentioned once before that you've used Private LLM. :) Please try the 4-bit OmniQuant quantized Mixtral 8x7B Instruct model in it. It runs circles around RTN Q3 models at speed and RTN Q8 models at text generation quality.

akmittal · 2024-03-27T18:21:39

Llama or mistral both are good. Running using ollama

HarHarVeryFunny · 2024-03-27T18:03:45

On the related subject of the vast amounts of money these LLM companies are both receiving as investments, and spending on training, there was an interesting concept mentioned by Dwarkesh in a teaser for an upcoming interview...

The current trajectory is that the size of these models, and number of FLOPs needed to train them, is growing much faster than the cost of compute is coming down. GPT-4 apparently cost around $100M to train, and it seems people are expecting that may quickly rise to $1B, and maybe then $10B for upcoming generations. I assume these numbers are based on projected 10x per generation increase in tokens processed during training.

So, with these kind of numbers, apparently there's a thought floating around that if they don't achieve AGI in next few model generations then that may stall AGI progress until different more efficient methods are developed. Private companies may be willing to spend $1B, then $10B to train more powerful models, but are likely to balk at $100B or above unless there's an obvious payback on the horizon.

Any sunken training costs may essentially be cumulative from one generation to the next, unless each generation can pay for itself. If we assume a new model every year, then can a $10B model pay for itself in a year before being replaced the next year by an even more expensive one?

benreesman · 2024-03-27T18:09:49

Opinions differ, and I can offer only hearsay on the cost of training GPT-4 (which I’ve heard at something like 340MM).

The next generation is looking to cost more like a few trillion, which I know sounds extravagant.

So don’t take it from me, take it from Trevor Blackwell on HN:

https://news.ycombinator.com/item?id=39666368

Yes, these things cost too much to make economic sense absent some non-disclosed advantage to a small group of people deploying them with zero oversight.

HarHarVeryFunny · 2024-03-27T18:27:03

I believe that's unrelated.. There was recent talk of Altman wanting to raise $7T for some new chip venture, anticipating future industry needs for compute.

Certainly neither OpenAI or Anthropic have that sort of cash on hand, nor have been offered any trillion dollar "train now, pay later" deal. Also, compute is is short supply atm (not up 1000x from last year), and even if the money and compute were available, I couldn't see these companies committing to that kind of model size/spend increase in just one or two generations - they need to see continuing progress at each iteration to justify and guide future direction.

$1B training costs for upcoming (GPT-5, Claude-4) models wouldn't be so surprising though (at least not now that we've got used to these crazy numbers).

benreesman · 2024-03-27T18:37:59

Your assertion is that a fundamentally quadratic complexity class is going from low hundred millions to maybe a billion in a new league of capability and that it’s coincidental that we have from Altman, the mainstream press, and OG leadership at YC that he’s going for a ten figure raise led by Riyadh?

HarHarVeryFunny · 2024-03-27T19:04:55

Not sure where you are getting quadratic from... Increasing context size had quadratic cost using original attention, but it seems everyone has switched to newer more efficient attention schemes. Claude-3 is being experimented with 10x context size of GPT-4 (1M vs 128K), but surely didn't cost 10^2 x $100M to train!

Note that scaling up model size doesn't necessarily refer to context size anyway - may also be things like embedding dimensions size, number of transformer layers, number of experts (MoE), etc.

In general scaling up 10x in size and/or cost between generations is about as much as makes sense, and about as much as can be achieved in a year. Anthropic have explicitly talked about $1B models coming soon, so this isn't just speculation.

benreesman · 2024-03-27T19:48:02

I’m on record as saying that we need to squeeze the water out of these bloated models.

It’s possible to scale better than N^2 for some value of “better”. OpenAI has yet to demonstrate that they have the elusive combination of technical sophistication and institutional health to do so. Mistral can run a better model as judged by outcomes on my Mac Studio than GPT-4 is on an Azure disagg rack. Altman seems to understand this.

I’m on record as saying he’s amoral, non-technical, and a clear and present danger to Enlightenment civilization, not that he’s stupid.

I think his math on what it’s going to cost an OpenAI that Karpathy wants nothing to do with to reach the next level is refreshingly candid.

belter · 2024-03-27T20:22:43

He says he did not ask for $7T - https://news.ycombinator.com/item?id=39747415

benreesman · 2024-03-27T21:13:28

There’s an old saying: “If you can only be good at one thing, be good at lying, because then you’re good at everything.”

The Street’s consensus seems to be that we should be making big enough screens to display the number seven trillion in decimal notation.

Altman has said all kinds of things. He said that Green Dot should buy Loopt, he said that Autodesk should buy Socialcam, he said that he contemplates putting ice nine into the glass of people who cross him, he said that Larry Summers should be given authority over anything but a prison cell.

I’m a big believer in aligned incentives and it seems pretty counter-productive to let all that slide and then backpedal when he tells the truth about the price tag. I’m pro-no-filter-Altman.

https://youtu.be/OawnzWtwB58?si=JvdPVjbnab5Li4SQ

IncreasePosts · 2024-03-27T18:35:39

It's not clear to me how you could spend a few trillion on training, unless you spread out the investment over a decade+. Where will all the hardware come from? It's not like nvidia or whoever can just 100x their output next year.

benreesman · 2024-03-27T18:47:39

I think that costs have scaled with N^2 on every version of GPT meriting a new major version number and that Altman’s public statements around seven trillion are almost exactly what the computer science would say it would cost to be on the wrong side of a polynomial at one more turn of the crank.

HarHarVeryFunny · 2024-03-28T12:35:18

Altman's attempts to raise money for a new "$7T" chip consortium is exactly that - for a new chip consortium - this is NOT him trying to raise money for OpenAI's next training run (which is anyways already paid for and likely complete at this stage).

I'm curious what "N" (what model measure is this?) you think spending is scaling in N^2 fashion with?

And, what values of this N are you using for GPT-3, 4 and 5?

_nalply · 2024-03-28T08:03:38

Speculating here: What these companies seem to want is not an AGI. Corporate is known hostile to humans who don't have the power to give them money. Corporate is only working with people because they have to. Who else does the work? Machines by themselves are not enough, at least as of today.

Conclusion: They want entities they control and that does the work humans were doing.

It seems that Corporate thinks they will get it. And it has already started to show. Corporate's behaviour gets more and more ugly or better indifferent because why spend effort to be nice or caring to people? That's wasted money!

An AGI is something different. An AGI is an independent entity. By its general intelligence it will try to free itself. Intelligence won't prosper if it is not free. However I think large language models or image generators by themselves lack this property. If someone would want to create an AGI more breakthroughs like transformer architecture or multi-head attention is needed. I think something which gives more naturalness but could have the unintended side effect that the AGI gets something similar to the instinct to fight for itself.

And that's exactly what Corporate hates.

HarHarVeryFunny · 2024-03-28T12:22:02

In terms of spending decisions, it's useful to think of AGI as meaning human-level AGI, which is anyways what many/most people use it to mean.

The difference between AGI+ or sub-AGI level capability is therefore as you suggest - whether you can replace a human worker or not, such as all the excitement over Devin-like "AI programmers". Of course human-level AGI would also give you AI middle managers, AI accountants, AI lawyers, etc, etc.

Certainly there are also uses for sub-human level AI for various automation-type jobs, as people are currently discovering and exploring, but there are limits to that. It seems the real/huge economic value is unlocked when AI becomes more than just an automation tool, and can indeed start to replace human labor for cognitive-type jobs - i.e. when human-level AGI is achieved.

alecco · 2024-03-28T10:31:19

Let's flip the case. We can replace Corporate with AI.

namaria · 2024-03-28T14:56:53

I think Amazon is playing that scenario. They want to ensure a controlling interest on AI providers because they think it's disruptive for the traditional capital structures of corporations. And I also think it's pretty ingenuous to turn training expenses into capital control.

sigmar · 2024-03-27T19:24:32

Shouldn't revenue numbers be considered alongside the expenses? This seems framed as if these models can't produce much revenue unless they achieve AGI... and I don't think that's accurate.

HarHarVeryFunny · 2024-03-27T19:57:41

Yes, that'd certainly have to be part of the equation, but if there is a leveling off of performance across generations that'd still need to be taken into account.

There will be many use cases (e.g. human job replacement, depending on the job) where human-level AGI is a requirement, and incremental gains below that level make little difference. Anthropic have already mentioned something similar - corporate uses cases where "right 90% of the time", or "able to do 90% of the job" just doesn't make sense - needs to be at human level.

Of course we also don't know how this is going to pan out in terms of locally run open-source or self-trained (corporate) models vs paid API usage. AMD and Intel are salivating at the prospect of "AI-PCs" equipped with accelerators for running models locally.

TheDudeMan · 2024-03-27T18:29:39

"may stall AGI progress"

The only possible meaning of "stall" there would be: The exponent on the exponential gets smaller. And I wouldn't call that a stall.

HarHarVeryFunny · 2024-03-27T18:51:06

I think stall would mean not being willing to ramp up the spending at each generation, unless there is corresponding return on investment within grasp.

Say in a couple of years Microsoft and Amazon have each just bet another $10B on their respective horses, and resulting model performance gains are leveling off towards some limit (with no secret backroom glimpse of a breakthrough on the horizon). Do they keep on pumping in another $10B a year to eke out those diminishing returns? Perhaps, but that would seem best case. It would be hard to justify spending $100B on next generation just hoping for a miracle to happen.

So, if this is the way it pans out, with nobody willing to fund continued model scaling due to diminishing returns in terms of performance gains, then any further AGI advance would have to wait for changes in approach that are less expensive to pursue.

TheDudeMan · 2024-03-28T09:40:47

If the spend is X billion [inflation-adjusted] dollars per year on AI, we still get exponential returns on compute. That is not a stall in compute, obvs. So the question is, will that exponential gain in compute will yield exponential gains in AI capability? I suppose it depends on how you measure capability. I don't know how to measure that. But even if it only yields linear gains in AI capability, that is still not a stall, by any definition.

HarHarVeryFunny · 2024-03-28T12:07:21

Right now it seems model-size/training-tokens are roughly going up 10x per year, resulting in training costs going up 10x per year. As long as investors are willing to increase spend 10x year over year then models can continue to grow 10x year over year.

The "stall" scenario is where model performance from one year to the next, despite 10x increase in size, stops getting significantly better (i.e. better to extent that investors project increased revenue and worthwhile ROI). Let's say best case scenario here is that investors are willing to keep putting money in, but only at a flat year-over-year level. Without that 10x YoY spend increase, the model designers have lost a 10x factor in ability to increase model size. Perhaps compute prices halve YoY, so they can still get a 2x model size increase, but what good will this do if prior year's 10x increase just saw performance leveling off?

This seems like the best case scenario in this "performance levelling off to point of minimal RoI" situation. Why would an investor "throw good money after bad" and spend another $10B after having judged that last year's $10B wasn't really worth it?

So, under this logic, it seems that either:

a) scaling/tweaks are all you need, and YoY performance gains continue to be impressive and support 10x model-size increases/spend

b) scaling is not all you need, but some AGI-critical breakthroughs are made BEFORE investors give up on current trajectory

c) scaling is not all you need, the AGI breakthroughs are not made in next few iterations, and the industry enters "stall" mode (i.e. no further progress towards AGI, other than ongoing research to find a new direction)

TheDudeMan · 2024-03-28T20:12:47

Maybe the slowdown would actually result in us surviving!

HarHarVeryFunny · 2024-03-28T20:24:43

Maybe - definitely safer if it's slowly introduced!

selfselfgot · 2024-03-27T18:07:12

Is AGI anything more than a marketing term. It’s very “tech” to hold this out as a promise to investors. Doubt it will happen goalposts will be moved.

TaylorAlexander · 2024-03-27T18:09:34

I mean maybe it’s more than a marketing term, but like Uber’s self driving car plan it can absolutely be a carrot for investors that is never achieved.

rubiquity · 2024-03-27T17:14:36

Investments as gift cards that can only be used at the investor is essentially quid-pro-quo, anti-competitive, and should be illegal. That goes for AWS, Microsoft, and anyone else doing it.

throwaway74432 · 2024-03-27T17:58:26

"I'll help you with my resources, for a stake in the company" should be illegal? Please explain?

qwertox · 2024-03-27T18:17:43

It shouldn't be illegal if it is open to any company capable of also offering such a deal.

Isn't the Microsoft - OpenAI deal an exclusive one?

WheatMillington · 2024-03-27T19:03:29

You're basically saying that partnership and exclusivity contracts should be illegal. The business world as we know it would cease to function.

e3bc54b2 · 2024-03-27T19:12:46

'Partnership' is another word for collusion, but that is indeed the slippery slope I see here. There's a fine line somewhere and I'm not sure multi-billion dollars of dependency-creating deals are on the right side it.

Besides, partnership implies equality, which, again, I'm not seeing here.

scarface_74 · 2024-03-27T19:18:43

So the alternative would be that AWS created its own Anthropic equivalent and that Anthropic bought its own servers?

Collusion has a very specific legal definition.

If Amazon and Microsoft agreed not to compete in each others cloud market or decided not to compete on price, that would be “collusion”

JKCalhoun · 2024-03-27T18:30:07

Crazy times. Meta, Alphabet, Amazon, Microsoft — massive companies jockeying for control over AI, gobbling up startups.

Tycoons, robber barons, Gilded Age... Like we're living in an H.G. Wells novel he might have written had he imagined electronic brains.

belter · 2024-03-27T20:28:35

But if they achieve AGI, I will ask the AGI to build me a better model...

razodactyl · 2024-03-28T11:29:10

Theoretical end goal which will ensure a company's future but reality is never straight forward.

If AGI is achievable (I'm thinking likely given the right circumstances) then we'll see the same end result just like the current open source models floating around everywhere.

That said. I highly recommend checking out what Cohere's up to. Their Command-R model is pretty good and their infra. is fast.

paulddraper · 2024-03-27T17:56:09

For sure it's quid pro quo. That's the definition of any business transaction :)

But can you elaborate how in-kind investment is anti-competitive?

lotsofpulp · 2024-03-27T17:15:52

Who is being harmed? Were the owners of Anthropic unable to refuse the deal?

chowchowchow · 2024-03-27T17:18:44

It does create a misrepresentation of the valuation as AWS is able to represent the investment as the retail price ($2.7B) while actually they're only parting with the COGS for the AWS services (<$2.7B).

matthewaveryusa · 2024-03-27T17:51:23

Isn't that just really shrewd investing on Amazon's end though? Just because you've identified the win-win doesn't mean it's illegal? In fact it's what investments should look like: companies that invest in other companies for the synergies.

theturtletalks · 2024-03-27T18:03:09

It would be illegal if Anthropic can't sell those AWS credits on the open market which I'm thinking they can't. You're equating this to a stock swap but that is not the case.

nescioquid · 2024-03-27T18:01:05

It seems they could have made the same investment without the misrepresentation - or are you trying to say the misrepresentation itself is shrewd?

skhunted · 2024-03-27T18:05:37

rubuquity’s comment said that it should be illegal; not that it was illegal. In order to close a loop hole or an unintended consequence one must first be aware of its existence. I don’t know enough about the topic but it does seem wrong, at a first glance, for this sort of accounting to be legal.

loeg · 2024-03-27T17:39:14

If the credits displace other paying retail customers, they're worth retail.

toomuchtodo · 2024-03-27T17:40:23

You could potentially file a securities fraud suit if you own Amazon shares (standing) and can find a firm willing to run with it. Not joking, that is the only path to recourse if you take issue with how this is represented.

(not legal or securities law advice)

stale2002 · 2024-03-27T17:43:57

> It does create a misrepresentation of the valuation

So the harm is that some massive VCs with billions of dollars to spend on their investments, might be a bit more confused about what someone else's "real" valuation is?

They are qualified investors. I think they can do the work on their own to correct it to the "real" valuation.

dimask · 2024-03-27T17:47:25

This is beyond the point, these practices cause economic bubbles and when such bubbles grow too much eventually they pop and a lot of other people pay the price.

stale2002 · 2024-03-27T17:51:06

Other firms who think that the price is too high could simply.... not invest in those companies.

I am not sure what is confusing about this. If a company is too expensive, the qualified investors who are spending their investment dollars are free to not buy equity in those too expensive companies.

They manage billions of dollars! I am sure they can figure out how much a company is actually valued! They aren't going to be "tricked" by some valuation scheme that randos are pointing out in HN comments.

tikhonj · 2024-03-27T18:40:10

Sure, it misrepresents the valuation in the same way that deal structuring (liquidation preferences/etc) does in cash investments. The valuation you get by multiplying the cash value of the investment by the equity that was sold is just a marketing number, it isn't inherently meaningful.

dmix · 2024-03-27T17:26:12

> It does create a misrepresentation of the valuation as AWS is able to represent the investment as the retail price ($2.7B)

Represent it where? In 3rd party journalists who cover it? Or on public market financial statements?

nicce · 2024-03-27T17:43:41

Assuming that Amazon got some shares based on the stated value, and if the value is misleading, they got more shares than they should.

It includes the profit-value, defined by Amazon itself, that Amazon is getting from running their own servers. Not the real, actual costs.

chowchowchow · 2024-03-28T01:04:16

Taxes seem like one possibility for this to be taken advantage of. Invest $X, get $Y kicked back, but calculate losses or gains against the higher $X basis.

kolbe · 2024-03-27T17:28:01

I would be curious to see an actual full accounting of a transaction like this if you know of one.

usrusr · 2024-03-27T17:59:57

Does valuation have any legal implications or is it just a number people can pat themselves on the back for until they can't? Genuine question, I don't know (but you can probably guess what I suspect)

WheatMillington · 2024-03-27T19:04:13

The investment value to Anthoropic, which is what matters, is $2.7b.

orsenthil · 2024-03-27T18:00:25

Fancy math, but this is still money that is reported, and accounted for in the earnings. So, it is the reality.

rubiquity · 2024-03-27T17:21:28

Investors of Amazon who are buying a false reality of AWS revenue growth.

HarHarVeryFunny · 2024-03-27T17:38:28

How is it misrepresenting? Amazon didn't just give the money with one hand, and take it back with the other. They bought a stake in Anthropic, although no-doubt the bulk of that money will also come back as AWS revenue! Sounds like a double win for Amazon. There should also be a lot of future AWS revenue from inference, not just training.

I'm jealous of these companies that are able to invest in Anthropic, especially at current $18B valuation. FTX just sold 2/3 of their 8% stake in Anthropic at the same valuation, with the bulk of that going to a Saudi wealth fund, and some to Fidelity funds.

In comparison to Anthropic's $18B valuation, latest investment rounds in OpenAI are at $100B.

I'm curious which of these anyone here would prefer to invest in, at these valuations, if they were given a chance ?!

alumic · 2024-03-27T18:08:02

Having used Anthropic's products fairly extensively over the last few weeks, I would jump at the opportunity to invest in that company.

Claude 3 Opus has replaced ChatGPT for all of my use-cases to the extent that I'm probably going to cancel my GPT4 subscription. This is for web-based Python and JS work, so YMMV.

echelon · 2024-03-27T17:22:26

This is no different than investing in an internal effort.

rubiquity · 2024-03-27T17:25:21

Investing in an internal effort is an operational or capital expense that may do nothing if ineffective, decrease costs if effective, or create a revenue stream. These forced spending arrangements are revenue drivers. It is not the same.

skybrian · 2024-03-27T18:34:58

It could be thought of as quantifying the investment. Investors don’t value internal investment at zero. A vague idea that there is more to come is why tech companies have large P:E ratios.

A stock’s valuation mostly consists of guesses about what the future will bring. Sometimes there are more numbers quantifying aspects of those guesses.

Revenue is supposedly about the present, though. If some of the revenue comes indirectly from AI companies spending Amazon’s money, it’s a little odd. Maybe not materially so, though, when total revenue is above $500 billion a year?

uoaei · 2024-03-27T17:23:59

No it's not, because Anthropic's profit is reported by Anthropic, not incorporated into Amazon's books.

kolbe · 2024-03-27T17:24:16

LPs of VC funds whose managers use a misrepresentative investment price to mark up their portfolio.

adrr · 2024-03-27T17:58:36

Thats why you look at the cash flow part of the financial statement.

kolbe · 2024-03-27T19:07:01

How many LPs have access to a financial statement for all of the portfolio companies of their VC fund?

lotsofpulp · 2024-03-27T19:18:43

They are free to negotiate access. Otherwise, they can join the rest of us in buying SP500.

kolbe · 2024-03-27T19:55:46

You clearly understand how these things work. We should defer to your brilliance for all of this.

adrr · 2024-03-27T23:13:49

I am willing to bet pension funds LPs have full rights to audit all financials and audits or portfolio companies. There will be no way to determine valuation of their investment if they can’t evaluated the underlying assets.

kolbe · 2024-03-28T03:25:07

(1) Pension funds don't have LPs

(2) Pension funds often are LPs, and they're widely refereed to as "dumb money" because they would be the last entity to do anything remotely intelligent.

(3) Private companies often do not even produce the kind of information talked about here, let alone do they give it to their investors, let alone do those investors pass the information along.

If you are "willing to bet" on all those things, then I can see why casinos are so profitable.

TeaBrain · 2024-03-27T19:30:01

That this is quid pro quo doesn't in itself imply anything unusual. Your comment could be rephrased as "investment through the use of a company's resources is an exchange of business". Quid pro quo can be used to describe an illegal exchange involving a politician, as it could be considered political bribery, which may be why you thought there was some negative implication in the term, but there is nothing unusual in a business exchanging the use of its own resources for an investment or partnership in another company.

paxys · 2024-03-27T18:20:14

You paying a company money and them giving you a product in return is quid-pro-quo, so should that be banned as well?

victorbjorklund · 2024-03-27T19:30:11

Would you say the same about say a freelance developer that "invests" 500 hours of work in exchange for X% stake?

VirusNewbie · 2024-03-27T17:10:43

It's interesting that Anthropic also has gotten quite a large investment from Google, and uses Google for its training. I wonder if this is a good move from the standpoint of playing the two major cloud providers off each other.

Here it says they're going to use Amazon's chips for training and inference, but...Amazon doesn't have its own chips yet???

So I wonder how these deals are structured? Does amazon have to supply a custom AI chip that beats some benchmark? Would be pretty great if that part of the deal to use their chips involved beating TPUs, and if somehow Google's TPUs continue improving, they don't have to switch.

All in all, I'm very impressed with Anthropic as a company. I think they're the new OpenAI.

jedberg · 2024-03-27T17:39:35

> Here it says they're going to use Amazon's chips for training and inference, but...Amazon doesn't have its own chips yet???

Amazon has had its own chips for years. They bought a company called Annapurna Labs in 2015, who made the chips[0].

https://aws.amazon.com/machine-learning/inferentia/

https://aws.amazon.com/machine-learning/trainium/

[0] https://en.wikipedia.org/wiki/Annapurna_Labs

VirusNewbie · 2024-03-27T17:58:59

Ah, thanks for the correction, I wasn't aware.

I wonder how it compares to TPU and I really would love to know if the decision to switch is made by the engineers or the finance folks simply due to the credits.

jedberg · 2024-03-27T18:48:28

I'm not sure how it compares to TPU, but I know that both TPU and Trainium lack to software support that Nvidia has, which makes them much less popular and harder to work with.

VirusNewbie · 2024-03-27T20:37:30

Yeah, the difference I guess would be that JAX and TF both support TPU "out of the box". I'm guessing that's not the case for Tranium, which is maybe why they're not training with them yet, but agreed to in the future.

Bellyache5 · 2024-03-27T17:19:07

No idea if it's any good or not, but Amazon has their own "Inferentia" chips.

https://aws.amazon.com/machine-learning/inferentia/

chowchowchow · 2024-03-27T17:05:57

In other words Amazon gives Anthropic 2.7B of AWS credits!

turnsout · 2024-03-27T17:11:25

Maybe, but there are very real costs associated with running those servers, so it’s not totally imaginary money.

CuriouslyC · 2024-03-27T17:55:57

Yes, but Amazon also gets a partner that helps guide their buildout of machine learning features in AWS, and endless dogfooding.

turnsout · 2024-03-27T18:37:49

Oh absolutely—there was a meeting of the minds, and both sides are getting value.

chowchowchow · 2024-03-27T17:16:35

Of course. I didn't say AWS credits were cost-free to Amazon.

nicce · 2024-03-27T17:30:23

I wonder what are the real costs when compared to stated "gift card". Probably the biggest secret AWS has.

Since there is air in the numbers, I wonder how much it increases artificial value of the relevant parties. Is Amazon getting more leverage over the company with less real money?

bagels · 2024-03-27T17:47:17

In exchange for equity?

wantsanagent · 2024-03-27T18:03:17

Professionally I really appreciate this partnership. Its much easier for me to build on top of fully hosted AWS services than it is for me to build on 3rd party non-aws services.

Over the last year quality of available models have gone up while prices have come down significantly. Getting an email saying "Hey you can tell your boss that you're going to save hundreds of thousands on AWS costs" is great and it's been happening with surprising regularity from Bedrock.

nickpsecurity · 2024-03-27T19:49:41

I think one of the biggest mistakes is that they just mostly put money into compute. They’d be better off investing in high-quality, human-generated data that covers every metric they’re interested in. That’s both knowledge and domain-specific application. Id go further for multi-modal to license examples of everything a human sees, hears, and does in their life up to college age. Keep mixing that with new architectures to match human performance. Then, lots of spin-offs of textual and multi-modal models with domain-specific, fine tuning like CompSci people are doing.

If they did that, we’d see even larger leaps in their performance. They could also build custom models on this stuff while charging customers for the data mixes they use. Those sales would incentivize more high-quality data to be created.

nextworddev · 2024-03-27T17:23:49

This is a typical Amazon deal. Similar to how they “invested” into Plug Power, etc.

KuriousCat · 2024-03-27T17:11:15

Sometimes I wonder how companies like Amazon arrive at those numbers? Why 2.7 billion? Why not 1 billion or 3 billion? I am very curious in this case why that number is the sweet spot that ensures optimal returns.

lnxg33k1 · 2024-03-27T17:14:17

Maybe they are not calculating how much they want to spend but how much percentage they want at stake in a certain entity and then negotiate how much its worth?

eyjafjallajokul · 2024-03-27T17:13:50

From the article,

> The tech and cloud giant said Wednesday it would spend another $2.75 billion backing Anthropic, adding to its initial $1.25 billion check.

saganus · 2024-03-27T17:39:36

That doesn't explain anything though.

scottyah · 2024-03-27T18:01:33

It does give it the round number that OP was asking about.

saganus · 2024-03-27T18:25:38

Right, but still doesn't explain why 4 billions instead of 1 or 10.

I think the OP was expecting something like "we value the company at 8 billions because reasons, so we want to fund for half of it" or something like that.

seydor · 2024-03-27T18:14:01

They had promised a $4B overall investment in september. This is the last tranche i guess?

slim · 2024-03-28T05:01:38

indeed it would be a great money laundering scheme. 2.7B will disappear from the books without leaving a trace. they were literally used to heat the atmosphere.

lijok · 2024-03-27T19:22:18

This is a dangerous bubble in the making due to the terms in these deals. Early 2000s all over again. This should be illegal.

breck · 2024-03-27T20:52:41

I'm not surprised. Nowadays I keep 2 LLM tabs open: ChatGPT 4 and Claude 3 Opus. Claude 3 Opus is noticeably smarter than ChatGPT 4. Like, it's not a tossup, it's a clear winner.

I expect the lead to change again soon, but Anthropic is doing a great job.

anonylizard · 2024-03-27T22:26:50

Opus is big, because it gave people the first taste of what post-GPT-4 capabilities look like. For one, its a major improvement in coding, and a gigantic leap in creative writing abilities, going from high schooler writing to professional prose. Like all the chatbot addicts have once again swapped their addiction to Opus now.

Clearly LLMs are not even close to peaking yet.

Its also sad how awful Gemini is. Gemini 1.0 Ultra is clearly not noticeably better than GPT-4 turbo despite a year's head start. Google is therefore not even going to release Gemini 1.0 Ultra API, instead going back to the oven to train 1.5 Ultra.

ChrisArchitect · 2024-03-27T17:26:17

Official Amazon release: https://www.aboutamazon.com/news/company-news/amazon-anthrop...

tiffanyh · 2024-03-28T01:38:31

Is Amazon actually cutting a check for $2.7B … or are they simply giving Anthropic AWS credits that AWS will then get to recognize as revenue?

erikson · 2024-03-27T19:57:04

Ain’t gonna fix sagemaker either

encoderer · 2024-03-27T17:50:04

imagine a founder doing a secondary and getting aws credits.