500GB/s is going to limit it to at best 1/4 the DL performance of an nvidia gpu....

touisteur · 2024-04-23T11:15:05

Well I keep seeing all models quantized and for 2-bit, 4-bit and 1-bit quantizations I had good very good inference performance (either througput or latency) on CNNs and some RNNs on Alveo boards using FINN (so, mostly high level synthesis and very little actual fpga wrangling). No idea about the current status of all these, will read the paper though :-)