FAµST 3.35.15 — LazyLinearOp

A new class named LazyLinearOp is now available in matfaust and pyfaust.

Starting from a numpy array, a scipy matrix, a Faust object, or potentially many other compatible linear operators with efficient implementatons, this class follows the lazy evaluation paradigm. In short, one can aggregate low-level LazyLinearOp objects into higher-level ones using many  classical operations (addition, concatenation, adjoint, real part, etc.), without actually building arrays.  The actual effect of these operations is delayed until the resulting linear operator is actually applied to a vector (or to a collection of vectors, seen as a matrix).

The main interest of this paradigm is to enable the construction of processing pipelines that exploit as building blocks efficient implementations of  “low-level” linear operators.

LazyLinearOperators are complementary to other “lazy” objects such as LazyTensors in Kheops, or the ones of lazyarray, and WeldNumpy libraries, which, to the best of our knowledge, primarily provide compact descriptions of arrays which entries can be evaluated efficiently on the fly.

For more information and a demo about the LazyLinearOp class, a jupyter notebook is available here (also the notebook archive as .ipynb is provided here).

FAµST 3.33.5 — DFT / Butterfly Faust optimization

We optimized the DFT Faust product (pyfaust.dft / matfaust.dft).

The principle of this optimization is to simplify the butterfly matrix product to sums of diagonal matrix multiplications (likewise for the bit-reversal permutation).

This optimization is disabled by default, you need to use the diag_opt argument in order to use it.

In the figure below we show how it speeds up the Faust-vector and Faust-matrix products.

To reproduce the figure you can use these scripts: benchmark script, plot script.

We made this optimization available to any Faust composed of butterfly factors (Use the opt_butterfly_faust function in pyfaust or matfaust).

Special thanks to Simon Delamare and Rémi Gribonval who are at the source of this optimization.

 

FAµST 3.32.4 — DYNPROG product method GPU support

This version spreads the DYNPROG product method to GPU Fausts. This is a dynamic programming method to compute a Faust product (or a Faust-matrix product). It varies the order of matrix products composing the whole Faust product (aka Faust.toarray)  considering the cost of each product to speed up the whole computation. Find more information about this method here.

The table below compares the computation times obtained using DYNPROG or DEFAULT_L2R methods (in the latter, the product is basically computed from the left to the right). Four Fausts are tested, they are dense, sparse (CSR, BSR matrices) or mixed (dense + sparse matrices).

The data for the table below was made using this script (please note that the script is random, so the results won’t be exactly the same from one run to another).

Faust name Product Method Device Faust matrix type Comp. time (100 products) Speedup vs DEFAULT_L2R Number of factors
F1 DEFAULT_L2R GPU GTX980 dense 15.7 1 32
F1 DYNPROG GPU GTX980 dense 4.8 3.3 32
F2 DEFAULT_L2R GPU GTX980 sparse (CSR) 43.4 1 32
F2 DYNPROG GPU GTX980 sparse (CSR) 17.7 2.45 32
F3 DEFAULT_L2R GPU GTX980 sparse (BSR) 2.64 1 5
F3 DYNPROG GPU GTX980 sparse (BSR) 1.75 1.5 5
F4 DEFAULT_L2R GPU GTX980 mixed (BSR + dense) 6.26 1 14
F4 DYNPROG GPU GTX980 mixed (BSR + dense) 2.77 2.26 14

FAµST 3.31.5 — Butterfly factorization with permutations

The butterfly factorization function has been updated to allow using one or several permutations. This update makes the function fully match the reference paper [1] and is available for both pyfaust and matfaust.

In addition to the permutation functionality of the butterfly factorization, a new function named rand_butterfly has also been integrated in the wrappers (pyfaust, matfaust). This function allows to randomly generate a Faust according to butterfly supports. It brings a simple way to test the butterfly factorization using the full array of those random butterfly Fausts.

 

[1] Quoc-Tung Le, Léon Zheng, Elisa Riccietti, Rémi Gribonval. Fast learning of fast transforms, with guarantees. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapore, Singapore. (hal-03438881)

FAµST 3.30.0 — GPU Faust indexing optimization

The FAµST version 3.30.0 follows the previous news which showed the GPU Faust slicing optimization. This time this is the GPU Faust indexing that has been optimized. The figure below demonstrates how the optimization speeds up the related operations since the previous release (3.29.1).

The code to reproduce the figure is available here.

The figure was produced using a NVIDIA GPU GTX 980 configured with CUDA 11.

FAµST 3.29.1 — GPU Faust slicing optimization

The Faust slicing on GPU has been optimized. The following benchmark compares some computation times of Faut slicings using pyfaust-3.27 and pyfaust-3.29 on GPU.

The method to optimize the slicing is similar to the one implemented on CPU before (see the related news).


The GPU used to produce this figure was a NVIDIA GTX 980. The scripts to reproduce are available here.

FAµST 3.28.4 — New Faust constructors DST and DCT (type II)

pyfaust and matfaust have been completed with Faust constructors to implement Direct Cosine Transform (DCT) and Direct Sine Transform (DST) of type II. In the figure below you’ll get an insight of the computation speed up brought by these new functions compared to their counterpart dense matrices. The benchmark is made on the multiplication by a vector (to apply the transform on a discrete signal).

The figure was produced on a 1.6 Ghz 4-cores CPU.

To reproduct the figure please use the scripts hosted on gitlab.

FAµST 3.28.4 — New Faust constructors circ, anticirc and toeplitz

In this version you’ll find new Faust “constructors” in matfaust and pyfaust: toeplitz, circ, anticirc. As sparse Fausts these widely used functions in signal processing can spare memory and speed up many use cases.

In the figure below we show the acceleration allowed by the Faust versions of the functions compared to the corresponding full dense matrices built using scipy. The operations evaluated are the Faust-vector and matrix-vector products.

 The figure was produced on a 1.6 Ghz 4-cores CPU.

To reproduct the figure please use the scripts hosted on gitlab.

FAµST 3.27.0 — Faust slicing and indexing optimizations

This version of FAµST is a revamp of the Faust slicing and indexing implementations using a mechanism we call lazy slicing/indexing. It allows mainly to delay the Faust slicing and indexing operations by setting them virtually in order to avoid touching the Faust data until it is really necessary. Besides, some product related operations (as F[:, :N]@v) have been optimized based on this mechanism.

The figures below show the performance enhancement of these operations since the previous release of FAµST and make also a comparison to scipy which is outperformed on many cases. Scipy is used to implement an alternative Faust very basically as a Python list of scipy csr_matrix, for more details please look at the benchmark script.
The code to reproduce these figures is available on gitlab.
The figures were produced on a 1.6 Ghz 4-cores CPU.