Processor Selection for Embedded System

Processor Selection for Embedded System - TESLA Institute


Most embedded systems need some type of processor, but which one would you select? Would you go for the one that is familiar because it is what ‘we have always used,’ or would you benchmark them with your application code first? Well, it could be a mix of both and then some others, which this article will help you determine.

When we began working on this article, the first question in our mind was about what made it worthwhile to spend time on this. A few questions asked to some of my knowledgeable friends in system design gave us our first answer for the story.

For starters, it is the best way to reduce power consumption. Simply moving a generation ahead on a manufacturing process node gives you vastly reduced power consumption figures, or more power for the same performance—if that is what you want. This, in turn, reduces the heat generated by the processor, which means smaller heat sinks that allow you to get into an even smaller casing, or even go fanless.

On the other hand, you could select the processor such that it can handle all your number crunching and polygon-rendering madness. This lets you ensure a smooth UI, get new functionality in or go for cost-effectiveness.

Overall, the right selection helps you enhance the user experience by reducing the chance of system over-heating like what happened in some of the recent mobile devices, or any other untoward event that might make your customer curse the designer.


Where do we begin

Apart from which central processing unit (CPU) to select, there is also the problem of whether you should have gone for a graphics processing unit (GPU) rather than a CPU. So let us find out which one works best for you.

A CPU is like an executive that keeps on switching between tasks, directing IOs and peripherals to perform their operations and handling virtual memory functions. The GPU, on the other hand, is like a factory employee that can handle repetitive tasks with flair. It cannot switch tasks as fast as the CPU, but it has a lot more arithmetic logic units (ALUs) that make it better at performing mathematical tasks.

Most CPUs have an execution unit with branch or loops capability, which is used for controlling other logic blocks. GPUs are meant for highly parallel calculation, where data streams are operated upon, not for control application like a normal CPU is,” explains Satish Bagalkotkar, president and CEO, Synapse Design Automation.

The spokesman goes on to explain that GPUs are optimal for tight code with high parallelisation since they have hundreds of simple execution units with SIMD (single instruction, multiple data) capabilities. Algorithms that can be parallelised easily are best implemented on GPUs. A GPU normally has smaller memory availability in normal implementation than a CPU, so a GPU is probably not well suited for a data-intensive application.

Typically, a GPU assists video rendering for a CPU, but they also have some additional applications. Earlier servers did not need GPUs as they were managed over a text-based interface. However, firms have already announced GPU-based servers. Some of the key markets where CPU and GPU coexist are mobile devices such as smartphones and tablets, financial trading, video processing, medical imaging, etc. Some of the high-end GPUs with a control CPU are extensively used in industrial simulations, games,  entertainment and video processing. There is also an increased trend of high-performance GPU adoption in the finance area.

How does a GPU get more work done, whereas CPUs run at a higher frequency? We guess an example could be a case of BitCoin mining. The four cores of a CPU like the Intel i7-4770k, albeit running at a higher clock of 3.5 GHz, will be able to execute far less instructions per clock than a GPU like AMD Radeon R9 290X, which has 2816 stream processors at a comparatively lower clock of 1000 MHz.


How to select a GPU

Within GPUs, you can choose between solutions that have more complex shaders and those with simple shaders—the difference being that the GPUs with simple shaders will pack a lot more into the same die space. But which one do you need?

It is really application dependent. The real differentiator is whether the shader implements the functions and calculations needed by the application being discussed. If the shader does not implement the specific instructions needed, the complexity is irrelevant. From the design point of view, a simple shader is easy to implement and replicate to meet the performance,” says Satish.

How to select a CPU

The most important consideration for selecting a CPU is the application itself. Each embedded application comes with a specific set of requirements. Nate Srinath, founder and director, Inxee Technologies, explains, “The first step is to identify the domain of the application. These domains are nicely coupled with various industry verticals such as automotive, industrial automation, medical electronics, defense electronics, consumer electronics, mobiles, etc. Once the domain is identified, the next step is to identify the application requirement in terms of functionality, complexity, performance, cost, area, environmental details, manufacturing details, and power needs.”



Processor Selection for Embedded System 02 TESLA Institute

Pie chart of voting done amongst our online readers.

The question was—which vendor would you buy a processor from,

for your next embedded system?



Processor performance. Usually, we start this section talking about the clock speed, but we have learned that the instructions per second, operations per clock and the efficiency of the computation units are also equally important things to be considered.


The most important parameter in my mind is cost vs performance ratio for a processor. The next important parameter will be the longevity of the product guaranteed by the manufacturer. These two parameters will ensure the cost-effectiveness and the longevity of the product getting designed using the processor,” advises Shanmugasundaram M., associate director-PES at Happiest Minds Technologies.


The instructions per second give you an idea of the computation power for large computers, whereas a higher operation per clock means a computer that might have lower clock speed could still perform competitively. Current processors have multiple cores and GPUs on the same die, which provides enhanced performance without sacrificing on power consumption or thermal design power (TDP). Benchmarking your shortlisted processors with a trial run of the application code is a good way to see if it meets your requirements.


The hardware acceleration feature in a processor is useful to speed up tasks with less software involvement. “It enables the multiprocessing option. The desired performance may be achieved by either a processor with higher processing power or with a slower processor accompanied by a set of hardware accelerators. Both approaches have their pros and cons. The merits and demerits of each of the approach would have to be evaluated against assigned weightage for the product requirement,” explain Vijay Bharat S. and Sachidananda Karanth-lead architects, Mistral Solutions.


We have to consider that by doing hardware acceleration of certain modules, expandability might become a trade-off. Consider the example of today’s processor chipsets having hardware acceleration for H.264 video paired with a ARM9/11 processor core for other processing. These chipsets are not usable for upgrading to the next generation codec—HEVC, because of the trade-off done when the CPU selected was considering H.264 hardware acceleration use-case only,” explains Saravanan T.S., marketing, Semicon & Multimedia, Tata Elxsi.

Industry requirements. Such industries as automotive and medical have a different set of requirements for electronics designed for them. In a car, mission critical systems, such as the engine and safety systems, have stringent standards that are defined by organisations like the International Standards Organisation (ISO) and the Automotive Electronics Council (AEC). The factories have to be ISO certified, while the components and systems made from these factories have to adhere to AEC guidelines. Examples are the ISO/TS16949 safety standard and the AEC-Q reliability guidelines, which include AEC-Q100, Q101 and Q200 documents.


Additionally, operating temperature range matters since many embedded applications are deployed in harsh environments where temperature shocks are present and thermal performance of processors (and other components) can make or break the solution,” explains T. Anand, managing director of Knewron.


The quality management standard for medical electronics is set by the ISO 9001:2008 and ISO 13485:2003 standards. In the US, the FDA 21 CFR Part 820 is a standard pertaining to good manufacturing processes for medical devices. The devices are also classed separately depending on their target application—toothbrushes go into Class I, infusion pumps and stents go into Class II while an implantable heart pump would go in Class III. The ISO 14971 standard specifies the process of risk management for medical device manufacturers.


Certain embedded applications like medical electronics require high levels of safety. “Applications like a media player require hardware acceleration to play media files. Applications in wireless communications require high levels of security. Storage and networking applications require high expandability. In essence, the priority of the functional requirements is driven by the domains and the application itself. What is important to an application is not necessarily important to another application,” adds Nate Srinath.


Support. The peripherals integrated on the processor are important as well, as these are key drivers of the overall bill of material (BOM) cost of the end product—higher the overlap between the peripherals required by the application and the peripherals integrated on the processor, the more optimal will be the BOM cost apart from the benefit of simplified board design complexity. “Given the fast-paced nature of today’s electronics market, one needs to also consider the software support (on an average, it is estimated that 70 per cent of the effort on typical embedded applications is on software development) and certifications available on the processor under consideration as this can significantly reduce effort, cost and cycle time—for example, if one is designing an industrial application with support of  Profibus or EtherCAT protocols, it may be prudent to choose a processor that supports these protocols with due certifications. Last, with today’s need of smaller devices, in areas like healthcare/implantables and wearable devices, the physical/package size of the chip can also be a deciding factor,” explains Praveen Ganapathy, director (Business Development), Embedded Processing, Texas Instruments (India).

Processor Selection for Embedded System 02 TESLA Institute


Business requirements. Ultimately, your design needs to be implemented as a successful product, ensuring that the product does not cost too much due to an unnecessarily beefed-up processor.

If you are planning for high-volume production, then it might be better to go for an inexpensive processor with additional engineering to implement functionalities through software solutions. In case of a lower target volume, it might be better to select a better (and more expensive) processor to implement on-chip functionality, so as to minimise engineering effort and design cycle time.

Bandwidth requirements. It is typical for product features to change during product development, i.e., the end-product features are very likely to be different from the initially conceived idea. “Depending on the market segment, the degree of variation may differ. For industrial, automotive and military/aerospace, the variations may be minor while it may be sizeable in the consumer segment,” explain Vijay Bharat S. and Sachidananda Karanth.

T. Anand gives us some tips to get started on this. “First apply a simple thumb rule; whatever is your estimate on data or memory requirement, select at least 25 per cent higher capacity processor. And more importantly, always keep some margin in memory, bandwidth, data space, etc so that future updates do not cripple the system.”

The higher capacity thumb rule covers you from many small and distinct changes which most of the times result into touching or crossing the limits. Having additional capacity may cost a bit more but can save much more by avoiding catastrophic failure of system in the field or during last-minute critical changes,” adds Anand.

Vijay Bharat S. and Sachidananda Karanth share more. “A general strategy could be to plan for only 40-60 per cent of the bandwidth so as to allow for spikes and variations. The actual percentage could be tweaked after careful evaluation of the various aspects of product requirements during:

Design. Calculate the theoretical value to get the maximum bandwidth supported by the system based on processor speed and data rate.

Implementation. Validate the bandwidth with respect to software overheads and speed limitation.

Testing. Validate actual bandwidth and ensure bandwidth calculation meets the requirements.”

When choosing a processor, matching the processor and the embedded application is important. It is very significant to prevent system failures, especially for embedded applications. In a good processor, there are usually two processors running the same data. If there is any mismatch, an alert is triggered in the circuit. The second aspect to look out for is the interrupt latency, which can affect the real-time schedulability of the system,” explains Praveen.


Processor Selection for Embedded System - TESLA Institute


Overall, one must keep in mind the processing bandwidth, memory bandwidth and memory sub-system interfaces. Depending on the application, you might require high-speed interfaces or processors, or even chip memory, etc.

Nilesh Ranpura, project manager, eInfoChips, shares some tips for us. “The first thing to be considered is the interface protocol meeting timings. This means, memory interfacing with processor needs to completely read and write cycles in efficient cycles. Write mem_read/write routine at low-level firmware which is tested on actual hardware.”

Shanmugasundaram adds that ultimately it all depends upon the application. “If an application is a real-time one, then bandwidth requirements are important along with prevention of system failures and data corruption. If the application is a non-real-time one, then the bandwidth requirements can take a backseat to avoid system failures and data corruption.”


Selecting a processor for embedded designs

Advent of different processor architectures has led to development of many application-specific processors. This section gives you an insight on selection of the correct processor based on the application for which it is required.

Another development worth noting in the Indian chip design scene is the indigenous SoC designing being done by firms such as Sankhya Technologies and Dreamchip to develop a ready-to-use synthetic processor and SoC platform for major markets. “These allow the designers to make optimal hardware and software trade-offs that gain them the flexibility of software and the performance of hardware, which any embedded system design house would love to have,” explains Gopi Kumar Bulusu, chief technologist at Sankhya Technologies.

Table in the next page provides a comparison of the most popular processors that are being used by OEMs for various embedded applications.

PC-compatible CoMs

Computer on module, or COM, is a compact, high-performance system designed to focus on core competencies. Highly integrated PC-compatible COMs are based on x86 CPU architectures that provide a scalable solution and at the same time meet the advanced CPU application development needs. Some of the widely used x86 architecture processors are listed in the table. From Intel Atom series via AMD designs to Intel core processors, designers have a wide range of processors to select from, for designing their COMs.

While single-core processors from Intel Atom series like N270, N455 and Z510 are more focussed on balance in power consumption, performance and pricing, newer AMD G-series processors provide new graphic performance on user-friendly interfaces or HD video onto the designs in single as well as dual-core options. AMD G-series with its 40nm technology vs Intel’s 45nm technology provides a powerful yet cooler design. If you are going for a design with low-power consumption, Intel Atom Z510/N270 is the option to be locked, while if you need higher graphic performance in your designs, AMD G-series will fit best.

Low-power COMs

COMs that are specifically designed for high performance in low-power envelope are based on ARM processors, but other low-power ×86 architectures could also be used. ‘SMARC’ or ‘Smart Mobility ARChitecture’ is a specification by the Standardisation Group of Embedded Technologies for such COMs.

ARM architecture-based processors are built with fewer transistors than other traditional x86 processors, resulting in reduced cost, heat and power usage. This makes them a perfect fit for the low-power COM designs. ARM Cortex A9-based i.Mx 6 family from Freescale is an example that supports a wide range of applications with its single, dual and quad core offerings.

Fanless box PC

The design of fanless box PCs used for various machine-controlled industrial applications is focussed on harsh environmental conditions and factory usage; however, they can also be designed for graphic-oriented applications like digital signage and media playback.

AMD G-series is a good option for graphic-oriented designs. Intel Atom series processors are good for robust low-power applications with a wide range of processor speeds to choose from. Apart from Intel Atom series and AMD G-series, VIA Nano X2 E and Eden E series are other dual-core series of processors that can be taken into account while designing fanless systems. X86 architecture-based Nano and Eden series have 40nm manufacturing technology and can withstand a wide range of temperatures due to their design.


Processor Selection for Embedded System - TESLA Institute


Human machine interface (HMI)

HMIs in industrial designs handle human-machine interactions. These can vary from simple switches, keypads to touchscreens and panel PCs. Most of the HMI solutions in the market are x86 based, using processors from Intel and other manufacturers that provide performance and flexibility. Industrial HMI requirements are increasing the need for greater processing performance and more interface supports, and at the same time maintaining minimum heat generation and compact size, which is contributing towards use of ARM processors for HMI designs.

Future ahead

Every now and then, newer versions of chips are released, however when the time comes for an upgrade, future expandability is a very important factor. Gordon Moore saw an end to processor speed growth when he said, “We can’t exceed the speed of light,” back in 1998. Rather than focussing on high speed, a smart engineer needs to focus on optimum speed, which can be sustained.