Installing TensorFlow on a Wintel machine with Quadro T1000 GPU and CPU versus GPU Testing

Peter Sels
4 min readAug 29, 2022

--

A. Installation

This tutorial results in giving you my conda environment file for installing TensorFlow on Windows. An environment based on this file can also run programs addressing the GPU. The python script here also tells you how to let the GPU participate or not in the computations. (Note that before the conda stuff, you’d best already install the CUDA toolkit 11.7. See below for instructions.)

As usual, you can just make a conda environment with it by

$ conda deactivate
$ conda env create --file=windows_tensorflow_20220828.yml --name=windows_tensorflow

I explain here how I constructed this yaml file (and also the error messages I encountered on the way and their solutions, so that Google can bring you here to the solution).

With a first ‘plain vanilla’ installation of the packages tensor-flow, when running a TensorFlow test, I received the Warning (hence the ‘W’ prefix):

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘cudart64_110.dll’; dlerror: cudart64_110.dll not found
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

I have a Quadro T1000 GPU (which has 896 cuda cores), which is about 56% of the performance of the top of the line RTX 3090 GPUs (which has 10496 cuda cores) today as you can see from this graph here:

Performance of my Quadro GPU versus the RTX 3090 top of the line GPU.

NVIDIA gives performance numbers here. So it’s worth solving this warning.

I could solve this by:

  • downloading the CUDA toolkit 11.7 installer exe file cuda_11.7.1_windows_network.exe form from here. Make the right selections for your system. I selected cuda-rt version 11.7 which was the most recent.
  • running it resulted in some screens saying some things were installed and some, due to absence of Visual Studio on my system, were not.
Some things not installed due to lack of a Visual Studio compiler on my system
In the end, quite some components were installed.
  • doing a system restart and
  • When running my TensorFlow test again, the warning about cuda-rt was gone. :)

The next error was:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple OpKernel registrations match NodeDef at the same priority

After quite some googling and a few trials I found this ‘TensorFlow install via pip’-page here. It mentions

Caution: The current TensorFlow version, 2.10, is the last TensorFlow release that will support GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow_cpu and, optionally, try the TensorFlow-DirectML-Plugin"

But we will for now ignore this warning and do what it says below:

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python -m pip install tensorflow
# Verify install:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

All went well for me and the last line returns

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

so yes it finds the GPU now and we get rid of any errors/warnings. :)

B. Test TensorFlowMNistTest.py

Running again our own test script 4 times, once for eager mode and once for non-eager mode, and for each: once with GPU made invisible and once with GPU made visible, gave as performance results.

WinQuadroT1000EagerCpuNoGpu.log:    0:11:05.841665
WinQuadroT1000NotEagerCpuNoGpu.log: 0:07:03.279670
Wini7QuadroT1000EagerCpuGpu.log: 0:02:09.316872
Wini7QuadroT1000NotEagerCpuGpu.log: 0:01:22.671263

In eager mode, switching on the GPU makes the fitting time go down from 11:05 to 2:09. In non-eager mode, switching on the GPU makes it go down from 7:03 to 1:22. So the GPU makes the execution time go down by about a factor 5.5.

C. Extra Comparison

We performed two extra installations and comparison benchmarks. One with the Apple M1 chip and one with an older Apple Intel iMac from 2015. Runtimes for 3 machines, for (eager, non eager) * (GPU, no GPU) were:

Making bar graphs of it gives

For the two most recent systems (Mac M1 from 2021 and Wintel with Quadro T1000 (2019), using their GPU improves speed by a bout a factor 4. For the older system (the Intel based iMac with AMD GPU from 2015), the GPU helps by a factor 3 in non-eager mode and 5 in eager mode.

This shows that for the older iMac, the GPU helps by a factor 3 in non-eager mode and 5 in eager mode. It’s quite similar, on average, for the newer machines (running OS X or Windows).

We get that the M1 Max chip (with 2 performance, 2 efficiency CPU cores) and embedded GPU with 32 cores) is about twice as fast in all modes as the i7 with the Quadro T1000 GPU chip having 896 cuda cores.

Written by Peter Sels on Aug 28th 2022.

--

--

Peter Sels
Peter Sels

Written by Peter Sels

Interested in all things Beautiful, especially Computational ones.

No responses yet