Why torch inference in slower than onnx and tensorrt
Why torch inference in slower than onnx and tensorrt?
Have you ever wondered
-
Why you don’t need model class to run ONNX or TensorRT inference but you need one to run PyTorch inference ?
-
Why a PyTorch model is slow compared to Its exported ONNX and TensorRT version?
In order to answer this, the terminologies I would use are
-
compiler
-
computation graph (static and dynamic).
Note
A compiler in basic sense does the job of understanding the information of flow of code/computation beforehand. So that while we run the code, we save time.
PyTorch compiles the computation graph at runtime, hence PyTorch weight files only stores weights but the not the details of how the weights and inputs are gonna flow during inference. But, .onnx and .trt file stores weights along with the flow of the weights and inputs. Hence, for ONNX and TensorRT inference we just need .onnx and .trt file.
However Torch2.0 also introduced compiler which you can use by running the command torch.compile() which is gonna capture the graph and use it for computation again and again. This support is added not only for training but inference too. This is going to be huge is they can support a lot of model through this. Still I think, there will be requirement of static graph which is reasonable.
NOTE : One can even parse ONNX file and convert back into PyTorch too. I have done that. You can approach me in case you need to do that.
Ref
Internet
Hết.