Unsupervised Visualization Of Images Using t-SimCNE

Ifeoma Veronica Nwabufo
4 min readJun 23, 2023

It is said that a picture is worth a thousand words. A picture can convey information more quickly and even more retainable information. However, a major obstacle faced in Data Science is the multidimensionality of data which makes data difficult to visualize in high dimensions; hence impedes a meaningful understanding of the structure of data.

Several methods exist that can be used to reduce the dimensions of data so that the information about the structure of the data is understood. The classic is probably Principal Component Analysis (I have written a blog post about it which can be found here). There are also methods such as Uniform Manifold Approximation and Projection (UMAP) and t-Distributed Stochastic Neighbor Embedding (t-SNE). Today, I’d write about t-SimCNE which is a method I’ve been experimenting with over a couple of months. It is a method by Böhm et al (2023) and the paper can be found here.

t-SimCNE

t-SimCNE is a visualization method based on contrastive learning and t-SNE. The t is the student t-distribution, Sim is gotten from SimCLR and CNE stands for Contrastive Neighbour Embedding.

Briefly, contrastive learning is a self-supervised technique of learning. The main ingredient is that this method seeks to train a neural network to learn to distinguish images without the use of labels by maximizing the similarity between the positive pairs of an object (gotten through augmentations) while pushing negative examples apart. If you want to read more about contrastive learning, check out this blog post.

Now, how does t-SimCNE bring all these methods together, and is it better than them?

TRAINING

There are broadly two spaces in t-SimCNE. There are 3 training chambers. In the first training chamber, there is a ResNet backbone (or any suitable backbone) that produces a 512-dimensional representation output from a given input. There is then a multilayer perceptron (MLP) (fully-connected projection head) comprising a single hidden layer with 1024 neurons, a ReLU activation, and an output layer of 128 units. They make up the first chamber of training (referred to as pre-training — this stage of getting representations from self-supervision is usually referred to as pre-training in self-supervised learning). In the original literature, this is trained for 1000 epochs.

T-SimCNE aims to reduce the 128D layer output to 2D. They do this by randomly initializing a 2D output layer. This 2D initialization follows a normal distribution. Then they swap the 128D output in the first training chamber with the initialized 2D weights. Next, the entire network except the last 2D output layer is frozen. Only the 2D output layer is fine-tuned for 50 epochs in order to ensure alignment with the previous parts of the network. That forms the second training chamber.

Lastly, the entire network is unfrozen and fine-tuned for 450 epochs, forming the 3rd training chamber. Thus, the total number of epochs trained is 1500.

Wait! We know that while training a network, the goal is to optimize some loss function. Next, we go into the loss function.

LOSS FUNCTION

Remember that t-SimCNE is an extension of SimCLR. In SimCLR, the authors use the NT-Xent loss (also referred to as InfoNCE loss). In t-SimCNE, the authors replace the NT-Xent loss with the t-SimCNE loss which is characterized by the use of the Euclidean metric to compute the similarity between points as opposed to the cosine metric used in SimCLR. Then, just like t-SNE, they use the Cauchy (t-distribution) kernel to transform the Euclidean distance into a similarity function. Hence, given two points zᵢ and zⱼ, the Euclidean distance dᵢⱼbetween them is given by dᵢⱼ= ‖zᵢ — zⱼ‖ with its associated Cauchy similarity as 1/(1 + d²ᵢⱼ). Then the t-SimCNE loss (referred also as the Euclidean loss in their paper) is defined as:

t-SimCNE Loss

In their work, they found out that using this Euclidean loss function in the first two training chambers gave lower loss values and better visually appealing clusters as opposed to the use of the SimCLR loss in the first training chamber followed by the Euclidean loss.

RESULTS

t-SimCNE has produced incredible results as observed in the results and produces better clusters in a 2D visualization than SimCLR. For self-supervised tasks like SimCLR and t-SimCNE, the quality of the 2D embedding is accessed via classification accuracy. It is also helpful to visualize the embeddings. t-SimCNE gives visually appealing embeddings better than t-SNE or its counterparts.

Below are the images from the embeddings produced in the CIFAR-10 and CIFAR-100 datasets. Interestingly, t-SimCNE was able to discover cluster subgroups in CIFAR-100 (You can look at the paper and find the classification results).

t-SimCNE Visualization for CIFAR-10 from Böhm et al (2023)
t-SimCNE Visualization for CIFAR-100 Böhm et al (2023)

Have any questions? Don’t hesitate to ask!

--

--