Curious Case of Hyperbolic Embeddings (Part - I )

8 min readJan 2, 2021

LEFT : Euclidean Embedding || RIGHT : Hyperbolic Embedding (Poincare Embedding)

“A book, even if it is written with complete honesty, is always worthless from one standpoint”

Discussing a recently popularized standpoint , the foundation of Hyperbolic Geometry and it’s few ML Applications especially in Natural Language Processing.

Hyperbolic Embeddings are now catching up and it’s kind of the buzz.

Let’s dive into it.

Below are few questions that I believe are essential to understanding the perspective of Hyperbolic Embedding .

P.S. : Please let me know if anything else needs to be added on for making it more understandable and relatable. 😃

What is an Embedding ?
What are the possible metric spaces where embedding is possible ?
What are the constraints of Euclidean Spaces ?
Why should you even consider using Hyperbolic Embeddings ?
Real Scenarios where Hyperbolic Embeddings prove useful in comparison with Euclidean embedding .
What are it’s limitations ?
What would be the next step for Hyperbolic Embeddings ?
Multi-dimensional Implementation of Hyperbolic Embeddings using Gensim and Plotly

1st let’s start with a more general question .

What is embedding ?

Word Embedding is a set of language modeling and feature learning techniques where words or phrases are mapped to vectors of real numbers.

What are the possible Metric spaces where embedding is possible ?

In ML, Data is classically represented in particular metric spaces where various traditional geometric learning techniques can be applied to learn useful models.

Classic examples of metric spaces that are widely used :

Euclidean Space
Riemannian Manifolds
Graph Metrics

We are seeing some new terms here . Let’s try and understand what is a Manifold.

What’s a Manifold ?

In differential geometry, the spaces that people study are called Manifolds, a sort of high-dimensional generalization of curved surfaces.

Each point of a Manifold can be assigned a curvature .

When the curvature is constant , it is either :
1. Positive
2. Zero
3. Negative

What are Isometric Embeddings ?

Isometric Embeddings are one that approximately preserves the graph metric, namely the length of the shortest path connecting each 2 nodes.

Isometric Embedding retains the topology and geometric structure being useful for learning complex data representations.

The Most Awaited Question :

What are the constraints with Euclidean Metric Space ?

PSD Gram Matrices ( Positive Semi-Definite Gram Matrices )

Gram Matrices are also called Pairwise Distance Similarity Matrices .

Pair-wise Distance and Similarity is an essential operation in various text mining tasks where it judges the similarity between 2 texts depending on their content.

A similarity matrix (Gram) is Euclidean only if it is a PSD, i.e. has no negative eigen values. It’s the same case with a distance matrix.

This shows that Euclidean embedding cannot perfectly reconstruct data if the underlying Gram Matrix contains even a single negative Eigen Value.

How often is the Gram Matrix a Non PSD ?

Happens always in practice for real data.

Distortion of Euclidean Graph Embeddings :

If input data has a graph structure, the result is a graph metric .

Arbitrary tree structures cannot be embedded with arbitrary low distortion in the Euclidean space with any number of dimensions,

This task becomes possible in Hyperbolic Space with only 2 dimensions where exponential volume growth matches the exponential growth of nodes with tree depth.

Short Diagonals Lemma :

Distortions of Arbitrary Metric Embeddings in the Euclidean Space. Bourgain’s Embedding Theorem :

Bourgain’s embedding theorem :

Any set of N points from a metric space (X,dX) can be embedded in the Euclidean space with (worst-case) distortion O(log N)

Embedding space that achieves the above distortion will have dimension O(log2 N)

With high probability, the Distortion is tight.

Eg : Constant Degree expander graph metrics cannot be embedded with a lower distortion

Non-Flat Manifold Data :

Non-Euclidean Data are intrinsically curved Non-Flat Manifolds.

If data points are taken from Riemannian Manifold, they cannot be embedded in Euclidean space without (potentially large) distortion .

Popular Embedding techniques or Non-Linear Dimensionality Reduction methods would exhibit embedding distortion and error when the data comes from an intrinsically curved manifold.

Trees and their associated metric require a space of constant negative curvature to accommodate their exponential volume growth.

Curse of Dimensionality :

Euclidean spaces suffer from the curse of dimensionality in high dimensions.

Minimum and maximum Euclidean distances between N points become a problem for methods such as nearest neighbor search , k-nearest neighbor , classification or clustering as space dimension goes to infinity.

For this as well as for better generalization capabilities, it is desirable to work in small dimensional embedding spaces having a better clustering behavior, e.g. trees embedded in small hyperbolic spaces

Why should you even consider using Hyperbolic Embeddings ?

Hyperbolic Space is different from Euclidean Space.

It has more capacity.

The volume of a ball grows exponentially with its radius.

Hyperbolic geometry is better suited to embed data with an underlying hierarchical / heterogeneous structure.

Below are few of the multiple possible domains where Non-Euclidean Spaces are more useful :

Computer Graphics (Meshed Surfaces) [ Definitely needs no introduction !]
Genomics (Human Protein Interactions)
Bioinformatics (Cancer Networks)
Recommender Systems
Network Science (Internet Topology)
Preserving Privacy

One of the few notable impacts of Impacts f Hyperbolic Embeddings are :

Drastic reduction in number of parameters and embedding dimensionality

Results in :

— Better generalization,

— Less over-fitting

— Better quality with training data

— Improved computational complexity

—Better clustering behavior

Low Embedding Distortion : Better preserving local and global Geometric information and avoiding discarding Essential information
Better disentanglement in embedding space : Improved clustering, Classification, interpret ability
Better model understanding and interpretation
Shape :

Euclidean distance completely ignores shape when finding a path from start to end.

Geodesic distance is constrained to within the given shape

In another word, if you care about how “physically” distances between point1 to point2, you need Geodesic, while if you need how far as if distance is going to be a signal the Euclidean will be your best choice

Analogous to Trees :

For trees, the shortest path between 2 nodes is the path through their parent.

Here, we can see that the shortest path between 2 points x and y is the geodesic bent towards the origin.

Exponential Expansion [Last but not the least, and definitely most important]

An example I found intriguing :

Below are 2 examples of Mammal Word-Net being modeled in the Hyperbolic Space and Euclidean Space

Representation of Mammal Word-Net in Hyperbolic Space

Representation of Mammal Word-Net in Euclidean Space

What are it’s limitations ?

Extra time and computation required for inference as compared to Euclidean Embeddings
Basic operations such as adding 2 elements or re-scaling 1 by a real number are not available in Hyperbolic Embeddings

Mobius Operations are used for such operations:

Mobius Scalar Multiplication
Mobius addition

REASON :

In Euclidean space, the shortest path between 2 points is given by

In Hyperbolic Space, Geodesic between 2 points is given by

So , what’s left to be accomplished in Hyperbolic Embeddings ?

We are yet to explore the products of Constant Curvature Spaces ( Manifold ) and be able to perform Mathematical Operations on them with relative ease

As per Fredric Sala , from Berkeley Institute of Data Science , we can take products of Spherical , Euclidean and Hyperbolic Embedding that gives a powerful expressive measure.

Putting Non-Euclidean Geometry to Work in ML: Hyperbolic and Product Manifold Embeddings

Presented at the Physics in Machine Learning Workshop held at BIDS May 29, 2019. Abstract: The quality of the…

bids.berkeley.edu

IMPLEMENTATION

I have created a Knowledge Graph of Hyperbolic Word Embedding using Gensim Library.

123nishant/Hyperbolic-Embedding

Contribute to 123nishant/Hyperbolic-Embedding development by creating an account on GitHub.

github.com

# extract subject
source = [i[0] for i in entity_pairs]# extract object
target = [i[1] for i in entity_pairs]kg_df = pd.DataFrame({'source':source, 'target':target, 'edge':relations})# creating a directed-graph from a dataframe using Networkx libraryG=nx.from_pandas_edgelist(kg_df, "source", "target", 
                          edge_attr=True,
                          create_using = nx.MultiDiGraph())plt.figure(figsize=(12,12))pos = nx.spring_layout(G)
nx.draw(G, with_labels=True, node_color='skyblue', edge_cmap=plt.cm.Blues, pos = pos)
plt.show()

If you notice properly , the inherent structure looks to be in a Hyperbolic Structure.

It’s a bit messy . Let’s extract individual data from the graph.

Extracting edge “thought” from the knowledge graph :

G=nx.from_pandas_edgelist(kg_df[kg_df['edge']=="thought"], "source", "target", 
                          edge_attr=True, create_using=nx.MultiDiGraph())plt.figure(figsize=(12,12))
pos = nx.spring_layout(G, k = 0.5) # k regulates the distance between nodesnx.draw(G, with_labels=True, node_color='skyblue', node_size=1500, edge_cmap=plt.cm.Blues, pos = pos)
plt.show()

Creating a Poincare Model :

from gensim.models.poincare import PoincareModel
new_model = PoincareModel(train_data = entity_pairs,
                          size = 2,
                          burn_in = 0,
                          workers = 1,
                          negative = 10)from gensim.viz.poincare import poincare_2d_visualization, poincare_distance_heatmapfig = poincare_2d_visualization(
    new_model,
    entity_pairs,
    "Poincare Hierarchy",
    show_node_labels=entity_pairs)

Visualizing this model in 3D using ‘matplotlib’s 3D toolkit’

It still seems a bit incomplete.

Let’s visualize it in 3D using TSNE in Plotly.

I have tried extracting the most similar nodes to ‘Alice’ :

x = new_model_without_2d.kv.most_similar('Alice', topn = 10)

REFERENCES :

Hyperbolic Deep Learning (http://hyperbolicdeeplearning.com/)
Dissertation of Octavian Ganea (https://ei.is.tuebingen.mpg.de/publications/tifgecgan19)
Poincare(Hyperbolic) Implementation of Gensim Library (https://radimrehurek.com/gensim/models/poincare.html)
My Implementation using Poincare models of Gensim and Plotly (https://github.com/123nishant/Hyperbolic-Embedding/blob/main/Hyperbolic%20Embeddings%20in%203D.ipynb)