# Dimensionality reduction

For visualizing high-dimensional data, e.g. in two dimensions, potentially getting more insights into your data, you can
reduce the dimensionality of the measurements, using this algorithm:

* UMAP
  * [Uniform Manifold Approximation Projection (UMAP)](https://arxiv.org/abs/1802.03426)
  * [UMAP Python implementation](https://umap-learn.readthedocs.io/en/latest/)

## Usage

* Menu Location: `Plugins > Compute Feature > Dimensionality reduction`

Select the graph type whose features should be dimensionality reduced, either the Model Graph with Features for Spots
and Links or the Branch Graph with Features on BranchSpots and BranchLinks.
Next, select the feature + feature projections that should be dimensionality reduced. Prefer to select features, which
describe the phenotype (e.g. size, shape, velocity, number of neighbors, etc.).
Only select positional features (e.g. centroid, coordinates, timeframe, etc.), if the position of cells within
the image are descriptive for the phenotype. If you are unsure, you can select all features and then remove the
positional features later.

## Description

The available algorithms reduce the dimensionality of the selected features and adds the results as a new feature to the
table. In order to do so, the selected algorithm uses the data matrix from the spot or branch spot table, where each row
represents a spot or branch spot and each column represents a feature. The link and branch link features can be included
in the algorithm.

If they are selected, the algorithm will use the link feature value of its incoming edge or the average of all values of
all incoming edges, if there is more than one incoming edge.

The dialog will look like this:
![umap_dialog.png](dimensionalityreduction/umap_dialog.png)

By default, all measurements are selected in the box.

## Parameters

### Common Parameters

* Standardize: Whether to standardize the data before reducing the dimensionality. Standardization is recommended when
  the data has different scales / units.
  Further
  reading: [Standardization](https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling).
* Number of dimensions: The number of reduced dimensions to use. The default is 2, but 3 is also common.
  Further reading: [Number of Dimensions](https://umap-learn.readthedocs.io/en/latest/parameters.html#n-components).

### UMAP Parameters

* Number of neighbors: The size of the local neighborhood (in terms of number of neighboring sample points) used for
  manifold approximation.
  Larger values result in more global views of the manifold, while smaller values result in more local data being
  preserved.
  In general, it should be in the range 2 to 100.
  Further reading: [Number of Neighbors](https://umap-learn.readthedocs.io/en/latest/parameters.html#n-neighbors).
* Minimum distance: The minimum distance that points are allowed to be apart from each other in the low dimensional
  representation. This parameter controls how tightly UMAP is allowed to pack points together.
  Further reading: [Minimum Distance](https://umap-learn.readthedocs.io/en/latest/parameters.html#min-dist).

When you are done with the selection, click on `Compute`.
The resulting values will be added as additional columns to the selected table.

![umap_table.png](dimensionalityreduction/umap_table.png)

You can visualize the results using the `Grapher` View of Mastodon and selecting the newly added columns.

![umap_grapher.gif](dimensionalityreduction/umap_grapher.gif)

Visualization with the [Mastodon Blender View](https://github.com/mastodon-sc/mastodon-blender-view) is also possible.

![umap_blender.gif](dimensionalityreduction/umap_blender.gif)

## Example

The example above has been generated using the [
tgmm-mini](https://github.com/mastodon-sc/mastodon-example-data/tree/master/tgmm-mini) dataset, which is included in
the Mastodon repository.