Am curious if NPUS can be used in Ollama or Local LLMS,if it can’t then its completely useless also useless if you don’t use AI at all.
This might partially answer your question: https://github.com/ollama/ollama/issues/5186.
It looks like the answer is, it depends on what you want to run as some configs are partially supported but there’s no clear cut support yet?
I tried running some models on an Intel 155h NPU and the performance is actually worse than using the CPU directly for inference. However, it wins on power consumption front IIRC.
I mean, even if the NPU space can’t be replaced by more useful components easily or cheaply, just removing it is sure to save a small amount of power which equates to a possibly not so small amount of heat that needs to be dissipated, which takes not insignificant amounts of and/or requires slowing the system down. Additionally, the pathways likely could be placed to create less interference with each other and direct heat transfer which is likely to mean more stability overall.
Of course without a comparable processor without the NPU to compare to, these are really difficult things to quantify, but are true of nearly all compact chips on power sensitive platforms.
No, it will not save any power at all. The power is only consumed during switching, so when some module is properly clock gated, it will not consume any power. There are many parts of the chip that are dark, for example the full CPU core could be disabled for various reasons and it does not affect power consumption when it’s dark. Maybe you know the Steam Deck, it is a battery operated device with the best power efficiency in its class. But what people don’t know is that it has more than 20% of its chip area disabled, as it relates to stereoscopic displays, because the same exact chip is also used by some AR or VR goggles, I forgot the name.
Also in general the modern chips are more limited by thermal rather than space. So realistically, even if you remove the NPU, you won’t be able to place anything high power there anyways, maybe you can put a couple hundred K of sram for cache in its place, but it won’t matter much in the end.
Microsoft, Google, and Apple are all quietly integrating NPUs into their devices, and implementing the software infrastructure in their operating systems to do on-device classification of content: Windows Recall, Google SafetyCore, and Apple Intelligence. These services are obsequiously marketed as being for your benefit, while all are privacy and surveillance nightmares. When the security breaking features of these systems are mentioned, each company touts convoluted workarounds to justify the tech.
Why would these companies risk rabidly forcing these unwanted, unpopular, insecure, expensive, and unnecessary features on their collective user bases? The real reason is to capture everything you do and store on your device, use the tensor hardware you may or may not even know that you purchased to analyze the data locally, then export and sell that “anonymized” information to advertisers and the government. All while cryptographically tying the data to your device, and the device to you, for “security”. This enables mass surveillance, digital rights management, and targeted advertising on a scale and depth previously unseen. Who needs a backdoor or a quantum computer to break consumer-grade encryption when you can just locally record everything everyone does and analyze it automatically at the hardware level?
Each of these providers is already desperate to scan, analyze, and classify your content:
Microsoft has been caught using your stored passwords to decrypt archives uploaded to OneDrive.
Apple developed forced client side scanning for CSAM before backlash shut it down. They already locally scan your photos with a machine learning classification algorithm whether you like it or not. You can’t turn it off.
Google recently implemented local content scanning with SafetyCore to “protect you from unwanted content like spam”. Then why is it scanning your photo library?
I would rather saw off my own nuts with a rusty spork before willfully purchasing a device with an integrated NPU. I fear that in the next 5-10 years, you won’t be able to avoid them. We are paying for the edge hardware being used for our own unwilling surveillance. Then, our tax dollars are paid to these tech companies to purchase the data!
Do you trust the rising fascist regimes and their tech lackeys in America and the UK to use this power morally and responsibly?
Do you really believe that these features that you didn’t ask for, that you cannot disable, and are baked directly into the hardware, are for your benefit?
Do you really believe that these features that you didn’t ask for, that you cannot disable, and are baked directly into the hardware, are for your benefit?
They’re not selling chips. They’re selling stock.
Microsoft, Google, and Apple are all quietly integrating NPUs into their devices, and implementing the software infrastructure in their operating systems to do on-device classification of content:
Too late. Apple has had an NPU in their phones since 2017. It’s been standard on flagship android devices since 2020 with various amd and intel processors starting around the same time.
I’m not here to defend LLMs or AI feature but for a comment to start with such a misinformed assessment of the state of reality reminds me of someone spouting off about chemtrails and weather machines.
At least NPUs actually exist.
If someone wants to avoid this stuff they are going to need to pick an open source platform that does not use these processor features.
I have to wonder if NPUs are just going to eventually become a normal part of the instruction set.
When SIMD was first becoming a thing, it was advertised as accelerating “multimedia,” as that was the hot buzzword of the 1990s. Now, SIMD instructions are used everywhere, any place there is a benefit from processing an array of values in parallel.
I could see NPUs becoming the same. Developers start using NPU instructions, and the compiler can “NPU-ify” scalar code when it thinks it’s appropriate.
NPUs are advertised for “AI,” but they’re really just a specialized math coprocessor. I don’t really see this as a bad thing to have. Surely there are plenty of other uses.
It’s tricky to use in programming though for non neural network math, I can see it used in video compression and decompression or some very specialised video game math.
Video game AI could be a big one though where difficulty would be AI based instead of just stat modifiers.
I agree, we should push more to get the open and standardized API for these accelerators, better drivers and better third party software support. As long as the manufacturers keep them locked and proprietary, we won’t be able to use them outside of niche copilot features no one wants anyway.
The problem that (local) ai has at the current moment is that its not just a single type of compute, and because of that, breaks usefulness in the pool of what you can do with it.
on the Surface level, “AI” is a mixture of what is essentially FP16, FP8, and INT8 accelerators, and different implementations have been using different ones. NPUs are basically INT8 only, while GPU intensive ones are FP based, making them not inherently cross compatible.
It forces devs to either think of the NPUs themselves with small things (e.g background blur with camera) as there isn’t any consumer level chip with a massive INT8 co processor except for the PS5 Pro (300 TOPS INT8, which compared to laptop cpus, have a 50 TOPs, so on a completely different league, PS5 Pro uses it to upscale)
SIMD is pretty simple really, but it’s been 30 years since it’s been a standard-ish feature in CPUs, and modern compilers are “just about able to sometimes” use SIMD if you’ve got a very simple loop with fixed endpoints that might use it. It’s one thing that you might fall back to writing assembly to use - the FFmpeg developers had an article not too long ago about getting a 10% speed improvement by writing all the SIMD by hand.
Using an NPU means recognising algorithms that can be broken down into parallelizable, networkable steps with information passing between cells. Basically, you’re playing a game of TIS-100 with your code. It’s fragile and difficult, and there’s no chance that your compiler will do that automatically.
Best thing to hope for is that some standard libraries can implement it, and then we can all benefit. It’s an okay tool for ‘jobs that can be broken down into separate cells that interact’, so some kinds of image processing, maybe things like liquid flow simulations. There’s a very small overlap between ‘things that are just algorithms that the main CPU would do better’ and ‘things that can be broken down into many many simple steps that a GPU would do better’ where an NPU really makes sense, tho.
Yeah totally. Here’s how MMX was advertised to consumers. https://youtu.be/5zyjSBSvqPc
As NPUs become ubiquitous they’ll just be a regular part of a machine and the branding and marketing will fade away again.
From a PC manufacturer perspective, the important part about including an NPU is that you can slap an “AI” sticker on your marketing. This is regardless of whether it has any actual use cases or not.
There’s definitely a large slice of “AI” features shipping now just to excite shareholders and serve no actual function to the user.
would he rip the cuda cores out of his Nvidia GPU?
Yes, i agree and if it must run neural network it could do it on GPU, NPU is not necesary.
Someone with the expertise should correct me if I am wrong; it’s been 4-5 years since I learnt about NPUs during my internship so I am very rusty:
You don’t even need a GPU if all you want to do is to run - i.e. perform inference with - a neural network (abbreviating it to NN). Just a CPU would do if the NN is sufficiently lightweight. The GPU is only needed to speed up the training of NNs.
The thing is, the CPU is a general-purpose processor, so it won’t be able run the NN optimally / as efficiently as possible. Imagine you want to do something that requires the NN and as a result, you can’t do anything else on your phone / laptop (it won’t be problem for desktops with GPUs though).
Where NPU really shines is when there are performance constraints on the model: when it has to be fast (to be specific: have real-time speed), lightweight and memory efficient. Use cases include mobile computing and IoT.
In fact, there’s news about live translation on Apple AirPod. I think this may be the perfect scenario for using NPUs - ideally housed within the earphones directly but if not, within a phone.
Disclaimer: I am only familiar with NPUs in the context of “old-school” convolutional neural networks (boy, tech moves so quickly). I am not familiar with NPUs for transformers - and LLMs by extension - but I won’t be surprised if NPUs have been adapted to work with them.
I’m not exactly an expert either but I believe the NPUs were seeing in the wild here are more like efficiency cores for AI.
Using the GPU would be faster, but have much larger energy consumption. They’re basically mathco processors that are good at matrix calculations.
I expect we’re eventually going to start seeing AI in more sensible places and these NPUs will prove useful. Hopefully soon the bubble will burst and we’ll stop seeing it crammed in everywhere, then it’ll start being used where it actually improves a product rather than wherever an LLM will fit.
i’m upvoting this comment from my internet enabled toaster.
The other day i saw a sandwich that was made by “AI”. This shit is going to vbe everywhere.
It would be nice if Frigate supported the NPU in AMD CPUs. Then it could be used for object detection with CCTV cameras.
I love how Frigate is like…the only thing you can use AI processors for. I got a Coral M.2 card and used it for Frigate for like a year and then got bored. Sure, I can have it ID birds, but there’s not much more practical use that I found.
You can use object detection to send an alert if there is a person in your yard or if a car pulls into your driveway. It avoids the frequent false alerts you get with normal motion detection.
Yes. I’ve used it. But security cameras are hardly a killer app.
Correct, for killer functionality to work you have to connect the camera to an automated turret system.
Instructions unclear. Accidentally did science.
patiently waits for paper to publish
So, yes, it would be unrealistic to suggest AMD could rip out the NPU and easily put that space to good use in this current chip.
a few amd chips did have the NPU removed in them, except they were mostly allocated to handheld gaming devices. the Ryzen Z1E for example.
when redreshed, there were a handful of hawkpoint cpus that were NPU less (Ryzen 7 8745HS)
Strix point does not have that dumped die cycle yet
I was excited when I learned that a new business laptop had a removable battery, a decent graphics card, and 1tb storage standard. I planned to buy it used for a fraction of its current price, 2k usd new, once some doofus got bored of underusing their machine and decided to trade up. Then I saw the AI chip and my desire wavered. You think there will ever be workarounds to make use of this garbage? I really want that removable battery.
That NPU is a math coprocessor. It can be very useful. It’s like a cuda core.
A CUDA core is just a vector processor like every GPUs since the late 90s has been made of, but with a different name so it sounds special. It doesn’t just run CUDA, it runs everything else a GPU has traditionally been for, too, and that was stuff people were doing before CUDA was introduced. There are lots of tasks that require the same sequence of operations to be applied to groups of 32 numbers.
An NPU is a much more specialised piece of hardware, and it’s only really neural network training and inference that it can help with. There aren’t many tasks that require one operation to be applied over and over to groups of hundreds of numbers. Most people aren’t finding that they’re spending lots of time waiting for neural network inference or draining their batteries doing neural network inference, so making it go faster and use less power isn’t a good use of their money compared to making their computer better at the things they do actually do.
Bring on the games that require AI
I’m pretty sure most video games require AI already. I stuggle to name ones that don’t use AI. Some that I can think of are snake, two-player pong and two-player chess.
Neural nets, on the other hand - I find it hard to imagine running a NN locally without impacting the game’s performance.
Hard, but should definitely be possible. I’m waiting for someone to write a decent roguelike, like imagine the possibilities.