Hardware Recommendations (2024)

Hardware Recommendations (1)

Our hardware recommendations for data science and analysis workstations below are provided by Dr. Don Kinghorn. These follow some standard patterns, but keep in mind that your specific workflow may have unique requirements.

Browse Our Recommended Systems

Puget Labs Certified

These hardware configurations have been developed and verified through frequent testing by our Labs team. Click here for more details.

  • Hardware Recommendations
  • Suggested Systems
  • Intel Xeon Workstation
  • AMD Threadripper PRO Workstation

Data Science System Requirements

Quickly Jump To: Processor (CPU)Video Card (GPU)Memory (RAM)Storage (Drives)

Data Science / Data Analysis is coupled with methods from machine learning, so there are some similarities here with ourHardware Recommendations for ML/AI. However, data analysis, preparation, munging, cleaning, visualization, etc does present unique challenges for system configuration. Extract, Transform, and Load (ETL) and Exploratory Data Analysis (EDA) are critical components of machine learning projects, as well as being indispensable parts of business processes and forecasting.

The “best” hardware will follow some standard patterns, but your specific application may have unique optimal requirements. The Q&A discussion below, with answers provided byDr. Donald Kinghorn, will be mostly generalities based on typical workflows. We also recommend that you visit hisHPC blogfor more info.

Processor (CPU)

In data science there is a significant amount of effort with movement and transformation of large data sets. The CPU, with its ability to access large amounts of memory, may dominate workflows in contrast to GPU compute in ML/DL. Multi-core parallelism will depend on the task, but parallelism in data processing is often very good.

What CPU is best for data science?

The two recommended CPU platforms are Intel’s Xeon W and AMD’s Threadripper PRO. Both of these offer high core counts, excellent memory performance & capacity, and large numbers of PCIe lanes. Specifically, the 32-core versions of either of these are recommended for their utilization and balanced memory performance.

Do more CPU cores make data science workflows faster?

The number of cores chosen will depend on the expected load and parallelism of tasks in your workflow. Larger numbers of cores may also allow for multiple simultaneous processes. An easy recommendation is for 32 cores with either of the Intel or AMD platforms mentioned above. The 96- or 64-core TR PRO may be ideal if you have highly data parallel tasks with a significant amount of time spent in computation, but scaling may not be as efficient as with the 32-core if memory access is a limiting factor. In any case, a 16-core processor would probably be considered minimal.

Does data science work better with Intel or AMD CPUs?

It is mostly a matter of preference. However, the Intel Xeon platform would be recommended if your workflow could benefit from some of the tools in theIntel oneAPI AI Analytics Toolkit, such as the Pandas alternative Modin which is optimized for Intel, or Advanced Matrix Extenions (AMX).

Looking for a Data Science Workstation?
Looking for a Data Science Workstation?

Video Card (GPU)

Since the mid-2010s, GPU acceleration has been the driving force enabling rapid advancements in machine learning and AI research. NVIDIA has had a massive impact in this field. For data science, the GPU may offer significant performance over the CPU for some tasks. However, GPUs may be limited by memory capacity and appropriate applications for data tasks outside of model training.

What type of GPU (video card) is best for data science?

NVIDIA dominates for GPU compute acceleration, and is unquestionably the standard. Their GPUs will be the most supported and easiest to work with. NVIDIA also provides an excellent data-handling application suite called RAPIDS.The NVIDIA RAPIDS tools may provide significant workflow throughput.

How much VRAM (video memory) does data science need?

This is dependent on the “feature space” of your data. Memory capacity on GPUs is limited compared to the main system memory utilized by CPUs, and applications may be constrained by this. This is why it’s common for a data scientist to be tasked with “data and feature reduction” prior to model training. That is often 80+% of the hard work for ML/AI projects. For some jobs, GPU memory may be a limiting factor even when there is a GPU-accelerated tool available for the data work. For larger data problems, the 48GB available on the NVIDIA RTX 6000 Ada may be necessary – and even that may not be enough for jobs that require all data to be resident on the device. Data movement can be a bottleneck because GPUs have such highly performant compute capabilities that they may be left idle a large percent of the time while waiting for memory to move around.

Will multiple GPUs improve performance in data science workflows?

For data analysis jobs that can take advantage of GPUs, having more than one may increase workflow. If you will be doing ML/AI jobs then multi-GPU can be beneficial since many frameworks provide for this. For data-oriented tasks, multi-GPU may have an advantage simply by providing more available memory to facilitate task parallelism. Not all workflows utilize the GPU well, though, as discussed previously.

Do I need NVLink when using multiple GPUs for data science?

NVIDIA’s NVLink provides a direct, high-performance communication bridge between a pair of GPUs. Whether this is beneficial or not is problem-type dependent. For training many types of models it is not needed. However, for any models that have a “history” component such as RNNs, LSTM, time-series and especially Transformer models, NVLink can offer a significant speed up and is therefore recommended. Please note that not all NVIDIA GPUs support NVLink, and it can only bridge two cards.

Looking for a Data Science Workstation?
Looking for a Data Science Workstation?

Memory (RAM)

CPU Memory capacity may be the limiting factor for some data analysis tasks.This is because an entire large data set may need to be resident in memory (in-core). There are methods and tools for “out-of-core” data analysis, but this can slow performance.

How much RAM does data science need?

It is often necessary, or at least desirable, to be able to pull a full data set into memory for processing and statistical work. That could mean BIG memory requirements, as much as 1-2 TB of system memory for the CPU to access.

Storage (Hard Drives)

Storage requirements are similar to CPU memory requirements. Your data and projects will dictate requirements.

What storage configuration works best for data science?

It’s recommended to use fast NVMe storage whenever possible since data streaming can become a bottleneck when data is too large to fit in system memory. Staging job runs from NVMe can reduce job run slow ups. NVMe and SATA solid-state drives are available up to 8TB capacity, with NVMe drives being much faster and generally preferred. Platter drives can be used for archival storage and for very large data sets, but should not be used for active working space. They are available in capacities exceeding 20TB now.

Additionally, all of the above drive types can be configured in RAID arrays. This does add complexity to the system configuration and may use up slots on the motherboard which would otherwise support additional GPUs – but can allow for storage space in the 10 to 100s of terrabytes.

Should I use network attached storage for data science?

Network-attached storage is another consideration. It’s become more common for workstation motherboards to have 10Gb Ethernet ports, allowing for network storage connections with reasonably good performance without the need for more specialized networking add-ons. Rackmount workstations and servers can have even faster network connections, often using more advanced cabling than simple RJ45, making options like software-defined storage appealing.

Looking for a Data Science workstation?

We build computers that are tailor-made for your workflow.

Configure a System

Don’t know where to start? We can help!

Get in touch with one of our technical consultants today.

Talk to an Expert

Related Content

  • AMD Zen4 Threadripper PRO vs Intel Xeon-w9 For Science and Engineering
  • Benchmarking with TensorRT-LLM
  • Experiences with Multi-GPU Stable Diffusion Training
  • Problems With RTX4090 MultiGPU and AMD vs Intel vs RTX6000Ada or RTX3090

ViewAll Related Content

Latest Content

  • Local alternatives to Cloud AI services
  • AMD Zen4 Threadripper PRO vs Intel Xeon-w9 For Science and Engineering
  • Benchmarking with TensorRT-LLM
  • Experiences with Multi-GPU Stable Diffusion Training

View All

Hardware Recommendations (2024)
Top Articles
Is Accounting and Payroll a Good Career? 3 Reasons Why
Earthquake Insurance: What You Need to Know - NerdWallet
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
4-Hour Private ATV Riding Experience in Adirondacks 2024 on Cool Destinations
Voorraad - Foodtrailers
Ds Cuts Saugus
Dee Dee Blanchard Crime Scene Photos
How Much Is 10000 Nickels
Ogeechee Tech Blackboard
Paketshops | PAKET.net
Geometry Escape Challenge A Answer Key
Natureza e Qualidade de Produtos - Gestão da Qualidade
Elle Daily Horoscope Virgo
Thotsbook Com
Meritas Health Patient Portal
Labor Gigs On Craigslist
Peraton Sso
Arboristsite Forum Chainsaw
Unit 33 Quiz Listening Comprehension
Geometry Review Quiz 5 Answer Key
MLB power rankings: Red-hot Chicago Cubs power into September, NL wild-card race
Hdmovie2 Sbs
The Old Way Showtimes Near Regency Theatres Granada Hills
Marlene2995 Pagina Azul
Cinema | Düsseldorfer Filmkunstkinos
Gopher Hockey Forum
Viduthalai Movie Download
Lawrence Ks Police Scanner
Perry Inhofe Mansion
How Much Is An Alignment At Costco
DIY Building Plans for a Picnic Table
Acuity Eye Group - La Quinta Photos
Wega Kit Filtros Fiat Cronos Argo 1.8 E-torq + Aceite 5w30 5l
Grandstand 13 Fenway
Foolproof Module 6 Test Answers
The Boogeyman Showtimes Near Surf Cinemas
Ukg Dimensions Urmc
Myfxbook Historical Data
301 Priest Dr, KILLEEN, TX 76541 - HAR.com
2 Pm Cdt
Union Corners Obgyn
Sarahbustani Boobs
My Eschedule Greatpeople Me
Quaally.shop
Canada Life Insurance Comparison Ivari Vs Sun Life
Verizon Forum Gac Family
Bellelement.com Review: Real Store or A Scam? Read This
Erica Mena Net Worth Forbes
The Latest Books, Reports, Videos, and Audiobooks - O'Reilly Media
Quest Diagnostics Mt Morris Appointment
Greg Steube Height
Craigslist Psl
Latest Posts
Article information

Author: Trent Wehner

Last Updated:

Views: 6319

Rating: 4.6 / 5 (56 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Trent Wehner

Birthday: 1993-03-14

Address: 872 Kevin Squares, New Codyville, AK 01785-0416

Phone: +18698800304764

Job: Senior Farming Developer

Hobby: Paintball, Calligraphy, Hunting, Flying disc, Lapidary, Rafting, Inline skating

Introduction: My name is Trent Wehner, I am a talented, brainy, zealous, light, funny, gleaming, attractive person who loves writing and wants to share my knowledge and understanding with you.