Remember to cancel your Microsoft 365 subscription to kick them while they’re down
Joke’s on them: I never started a subscription!
I don’t have one to cancel, but I might celebrate today by formatting the old windows SSD in my system and using it for some fast download cache space or something.
Looks like it is not any smarter than the other junk on the market. The confusion that people consider AI as “intelligence” may be rooted in their own deficits in that area.
And now people exchange one American Junk-spitting Spyware for a Chinese junk-spitting spyware. Hurray! Progress!
I’m tired of this uninformed take.
LLMs are not a magical box you can ask anything of and get answers. If you are lucky and blindly asking questions it can give some accurate general data, but just like how human brains work you aren’t going to be able to accurately recreate random trivia verbatim from a neural net.
What LLMs are useful for, and how they should be used, is a non-deterministic parsing context tool. When people talk about feeding it more data they think of how these things are trained. But you also need to give it grounding context outside of what the prompt is. give it a PDF manual, website link, documentation, whatever and it will use that as context for what you ask it. You can even set it to link to reference.
You still have to know enough to be able to validate the information it is giving you, but that’s the case with any tool. You need to know how to use it.
As for the spyware part, that only matters if you are using the hosted instances they provide. Even for OpenAI stuff you can run the models locally with opensource software and maintain control over all the data you feed it. As far as I have found, none of the models you run with Ollama or other local AI software have been caught pushing data to a remote server, at least using open source software.
Looks like it is not any smarter than the other junk on the market. The confusion that people consider AI as “intelligence” may be rooted in their own deficits in that area.
Yep, because they believed that OpenAI’s (two lies in a name) models would magically digivolve into something that goes well beyond what it was designed to be. Trust us, you just have to feed it more data!
And now people exchange one American Junk-spitting Spyware for a Chinese junk-spitting spyware. Hurray! Progress!
That’s the neat bit, really. With that model being free to download and run locally it’s actually potentially disruptive to OpenAI’s business model. They don’t need to do anything malicious to hurt the US’ economy.
The difference is that you can actually download this model and run it on your own hardware (if you have sufficient hardware). In that case it won’t be sending any data to China. These models are still useful tools. As long as you’re not interested in particular parts of Chinese history of course ;p
It is progress in a sense. The west really put the spotlight on their shiny new expensive toy and banned the export of toy-maker parts to rival countries. One of those countries made a cheap toy out of jank unwanted parts for much less money and it’s of equal or better par than the west’s.
As for why we’re having an arms race based on AI, I genuinely dont know. It feels like a race to the bottom, with the fallout being the death of the internet (for better or worse)
It is open source, so it should be audited and if there are back doors they can be plugged in a fork
And now people exchange one American Junk-spitting Spyware for a Chinese junk-spitting spyware.
LLMs aren’t spyware, they’re graphs that organize large bodies of data for quick and user-friendly retrieval. The Wikipedia schema accomplishes a similar, abet more primitive, role. There’s nothing wrong with the fundamentals of the technology, just the applications that Westoids doggedly insist it be used for.
If you no longer need to boil down half a Great Lake to create the next iteration of Shrimp Jesus, that’s good whether or not you think Meta should be dedicating millions of hours of compute to this mind-eroding activity.
I think maybe it’s naive to think that if the cost goes down, shrimp jesus won’t just be in higher demand. Shrimp jesus has no market cap, bullshit has no market cap. If you make it more efficient to flood cyberspace with bullshit, cyberspace will just be flooded with more bullshit. Those great lakes will still boil, don’t worry.
There’s nothing wrong with the fundamentals of the technology, just the applications that Westoids doggedly insist it be used for.
Westoids? Are you the type of guy I feel like I need to take a shower after talking to?
With understanding LLM, I started to understand some people and their “reasoning” better. That’s how they work.
artificial intelligence
AI has been used in game development for a while and i havent seen anyone complain about the name before it became synonymous with image/text generation
It was a misnomer there too, but at least people didn’t think a bot playing C&C would be able to save the world by evolving into a real, greater than human intelligence.
Well, that is where the problems started.
The best part is that it’s open source and available for download
I asked it about Tiananmen Square, it told me it can’t answer that because it can only respond with “harmless” responses.
That’s kind of normal, it was made in China after all and the developers didn’t want to end up in jail I bet.
That said, china is of course a crappy dictatorship.
Yes the online model has those filters. Some one tried it with one of the downloaded models and it answers just fine
When running locally, it works just fine without filters
This was a local instance.
Does the same thing on my local instance.
So can I have a private version of it that doesn’t tell everyone about me and my questions?
Yes
Checkout ollama. https://ollama.com/library/deepseek-r1
Thank you very much. I did ask chatGPT was technical questions about some… subjects… but having something that is private AND can give me all the information I want/need is a godsend.
Goodbye, chatGPT! I barely used you, but that is a good thing.
Yep, lookup ollama
Can someone with the knowledge please answer this question?
Yes, you can run a downgraded version of it on your own pc.
Apparently phone too! Like 3 cards down was another post linking to instructions on how to run it locally on a phone in a container app or termux. Really interesting. I may try it out in a vm on my server.
I watched one video and read 2 pages of text. So take this with a mountain of salt. From that I gathered that deepseek R1 is the model you interact with when you use the app. The complexity of a model is expressed as the number of parameters (though I don’t know yet what those are) which dictate its hardware requirements. R1 contains 670 bn Parameter and requires very very beefy server hardware. A video said it would be 10th of GPUs. And it seems you want much of VRAM on you GPU(s) because that’s what AI crave. I’ve also read 1BN parameters require about 2GB of VRAM.
Got a 6 core intel, 1060 6 GB VRAM,16 GB RAM and Endeavour OS as a home server.
I just installed Ollama in about 1/2 an hour, using docker on above machine with no previous experience on neural nets or LLMs apart from chatting with ChatGPT. The installation contains the Open WebUI which seems better than the default you got at ChatGPT. I downloaded the qwen2.5:3bn model (see https://ollama.com/search) which contains 3 bn parameters. I was blown away by the result. It speaks multiple languages (including displaying e.g. hiragana), knows how much fingers a human has, can calculate, can write valid rust-code and explain it and it is much faster than what i get from free ChatGPT.
The WebUI offers a nice feedback form for every answer where you can give hints to the AI via text, 10 score rating thumbs up/down. I don’t know how it incooperates that feedback, though. The WebUI seems to support speech-to-text and vice versa. I’m eager to see if this docker setup even offers programming APIs.
I’ll probably won’t use the proprietary stuff anytime soon.
Yeah, but you have to run a different model if you want accurate info about China.
Unfortunately it’s trained on the same US propaganda filled english data as any other LLM and spits those same talking points. The censors are easy to bypass too.
Yeah but China isn’t my main concern right now. I got plenty of questions to ask and knowledge to seek and I would rather not be broadcasting that stuff to a bunch of busybody jackasses.
I agree. I don’t know enough about all the different models, but surely there’s a model that’s not going to tell you “<whoever’s> government is so awesome” when asking about rainfall or some shit.
Yes but your server can’t handle the biggest LLM.
All of this deepseek hype is overblown. Deepseek model was still trained on older american Nvidia GPUs.
Your confidence in this statement is hilarious the fact that it doesn’t help your argument at all. If anything, the fact they refined their model so well on older hardware is even more remarkable, and quite damning when OpenAI claims it needs literally cities worth of power and resources to train their models.
AI is overblown, tech is overblown. Capitalism itself is a senseless death cult based on the non-sensical idea that infinite growth is possible with a fragile, finite system.
“wiped”? There was money and it ceased to exist?
It’s pixie dust
“off US stocks”
The money went back into the hands of all the people and money managers who sold their stocks today.
Edit: I expected a bloodbath in the markets with the rhetoric in this article, but the NASDAQ only lost 3% and the DJIA was positive today…
Nvidia was significantly over-valued and was due for this. I think most people who are paying attention knew that
To be fair, NQ futures momentarily dropped 5% before recovering some. A few days from now on would be interesting.
Trump counterbalance keeping it in check but my gut is saying once tariffs come in February there’s going to be a market correction. Pure speculation on my part.
You don’t have to say speculation when talking about the future of stocks. It’s implied unless you are a time traveler in which case you should lead with that.
I am a time traveller and I was trying to throw you off my trail but I seem to have failed.
There’s been a lot of disproportionate hype around deepseek lately
I’d argue this is even worse than Sputnik for the US because Sputnik spurred technological development that boosted the economy. Meanwhile, this is popping the economic bubble in the US built around the AI subscription model.
One of those rare lucid moments by the stock market? Is this the market correction that everyone knew was coming, or is some famous techbro going to technobabble some more about AI overlords and they return to their fantasy values?
Most rational market: Sell off NVIDIA stock after Chinese company trains a model on NVIDIA cards.
Anyways NVIDIA still up 1900% since 2020 …
how fragile is this tower?
It’s quite lucid. The new thing uses a fraction of compute compared to the old thing for the same results, so Nvidia cards for example are going to be in way less demand. That being said Nvidia stock was way too high surfing on the AI hype for the last like 2 years, and despite it plunging it’s not even back to normal.
If AI is cheaper, then we may use even more of it, and that would soak up at least some of the slack, though I have no idea how much.
How is the “fraction of compute” being verified? Is the model available for independent analysis?
Its freely availible with a permissive license, but I dont think that that claim has been verified yet.
And the data is not available. Knowing the weights of a model doesn’t really tell us much about its training costs.
My understanding is it’s just an LLM (not multimodal) and the train time/cost looks the same for most of these.
- DeepSeek ~$6million https://www.theregister.com/2025/01/26/deepseek_r1_ai_cot/?td=rt-3a
- Llama 2 estimated ~$4-5 million https://www.visualcapitalist.com/training-costs-of-ai-models-over-time/
I feel like the world’s gone crazy, but OpenAI (and others) is pursing more complex model designs with multimodal. Those are going to be more expensive due to image/video/audio processing. Unless I’m missing something that would probably account for the cost difference in current vs previous iterations.
The thing is that R1 is being compared to gpt4 or in some cases gpt4o. That model cost OpenAI something like $80M to train, so saying it has roughly equivalent performance for an order of magnitude less cost is not for nothing. DeepSeek also says the model is much cheaper to run for inferencing as well, though I can’t find any figures on that.
My main point is that gpt4o and other models it’s being compared to are multimodal, R1 is only a LLM from what I can find.
Something trained on audio/pictures/videos/text is probably going to cost more than just text.
But maybe I’m missing something.
The original gpt4 is just an LLM though, not multimodal, and the training cost for that is still estimated to be over 10x R1’s if you believe the numbers. I think where R 1 is compared to 4o is in so-called reasoning, where you can see the chain of though or internal prompt paths that the model uses to (expensively) produce an output.
I’m not sure how good a source it is, but Wikipedia says it was multimodal and came out about two years ago - https://en.m.wikipedia.org/wiki/GPT-4. That being said.
The comparisons though are comparing the LLM benchmarks against gpt4o, so maybe a valid arguement for the LLM capabilites.
However, I think a lot of the more recent models are pursing architectures with the ability to act on their own like Claude’s computer use - https://docs.anthropic.com/en/docs/build-with-claude/computer-use, which DeepSeek R1 is not attempting.
Edit: and I think the real money will be in the more complex models focused on workflows automation.
Yea except DeepSeek released a combined Multimodal/generation model that has similar performance to contemporaries and a similar level of reduced training cost ~20 hours ago:
Holy smoke balls. I wonder what else they have ready to release over the next few weeks. They might have a whole suite of things just waiting to strategically deploy
One of the things you’re missing is the same techniques are applicable to multimodality. They’ve already released a multimodal model: https://seekingalpha.com/news/4398945-deepseek-releases-open-source-ai-multimodal-model-janus-pro-7b
Lol serves you right for pushing AI onto us without our consent
The determination to make us use it whether we want to or not really makes me resent it.
Hilarious that this happens the week of the 5090 release, too. Wonder if it’ll affect things there.
Apparently they have barely produced any so they will all be sold out anyway.
And without the fake frame bullshit they’re using to pad their numbers, its capabilities scale linearly with the 4090. The 6090 just has more cores, Ram, and power.
If the 4000-series had had cards with the memory and core count of the 5090, they’d be just as good as the 50-series.
By that point you will have to buy the Mico fission reactor addon to power the 6090. It’s like Nvidia looked at the power triangle of power / price and preformence and instead of picking two they just picked one and to hell with the rest.
Nah, they just made the triangle bigger with AI (/s)
Emergence of DeepSeek raises doubts about sustainability of western artificial intelligence boom
Is the “emergence of DeepSeek” really what raised doubts? Are we really sure there haven’t been lots of doubts raised previous to this? Doubts raised by intelligent people who know what they’re talking about?
Ah, but those “intelligent” people cannot be very intelligent if they are not billionaires. After all, the AI companies know exactly how to assess intelligence:
Microsoft and OpenAI have a very specific, internal definition of artificial general intelligence (AGI) based on the startup’s profits, according to a new report from The Information. … The two companies reportedly signed an agreement last year stating OpenAI has only achieved AGI when it develops AI systems that can generate at least $100 billion in profits. That’s far from the rigorous technical and philosophical definition of AGI many expect. (Source)
Almost like yet again the tech industry is run by lemming CEOs chasing the latest moss to eat.
Interesting it won’t let you login or signup using a VPN, even set to the correct country
Aren’t VPNs illegal in China?
Let’s tariff taiwan!
TSMC just finished building out a foundry in Arizona, so there’s a nativist argument that we don’t need the island’s original facilities anymore.
Only building outdated chips on an old fab process. And they’re having a hard time hiring Americans to work there.
Tech bros learn about diminishing returns challenge (impossible)
No surprise. American companies are chasing fantasies of general intelligence rather than optimizing for today’s reality.
That, and they are just brute forcing the problem. Neural nets have been around for ever but it’s only been the last 5 or so years they could do anything. There’s been little to no real breakthrough innovation as they just keep throwing more processing power at it with more inputs, more layers, more nodes, more links, more CUDA.
And their chasing a general AI is just the short sighted nature of them wanting to replace workers with something they don’t have to pay and won’t argue about it’s rights.
Also all of these technologies forever and inescapably must rely on a foundation of trust with users and people who are sources of quality training data, “trust” being something US tech companies seem hell bent on lighting on fire and pissing off the yachts of their CEOs.