We present SmartExchange, a hardware-algorithm co-design framework to trade higher cost memory storage/access for lower cost computation, for energy-efficient inference of deep neural networks (DNNs). We have developed a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption of data movement as well as weight storage. To fully explore the potential of SmartExchange, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency. Extensive experiments using four DNN models show that the proposed accelerator can achieve up to 4.9×and 9.6×improvement in energy efficiency and energy-delay product, respectively, as compared to state-of-the-art DNN accelerators.