BSTLearner - An interactive visualization of binary search trees

A binary search tree (BST) is a data structure used for storing, retrieving and sorting data in an efficient way by using a binary tree structure with the property that the keys in a node’s left subtree are less and the keys in a node's right subtree are greater than the key of the node itself, and then making it balanced.

This tutorial explains the usage and implementation of an interactive binary search visualization in Python using Graphviz and Jupyter Notebook Widgets. Jupyter Notebook Version 1.0 BSTLearner 1.0.ipynb contains intermediate steps and test code. In Jupyter Notebook Version 1.2 BSTLearner 1.2.ipynb the classes have been moved to a Python module algolibs, so the notebook now is uncluttered and contains only the GUI / widgets code. This is work in progress! To be finished as project work by students.

Start the BSTLearner v1.0 


 Motivation

Binary search trees are best understood using interactive visualizations that show how to insert / search / delete values in a tree, how to create a tree from random numbers, how to balance the tree by performing left and right rotations, traverse the tree etc. Jupyter Notebook visualizations are useful because they can be easily shared with students and combine documentation and code blocks in a natural way. With Jupyter Notebook you can use the Python interface for the Graphviz graph-drawing software and create trees by using the Digraph class. With Jupyter Notebook Widgets you can add interactions and create a GUI in which the binary search tree operations can be explored step-by-step.



1 BSTLearner Usage

    Top

The BSTLearner app / Jupyter Notebook visualization has three tabs, the first one for binary search trees, the second one for AVL trees (self-balancing trees constructed by using a balancing factor and rotating the tree as needed to restore the balance), the third tab for B-Trees. You can create a new tree either step by step, by entering integer values in the Enter key field and then clicking "Insert", or by creating an entire tree from random numbers using the button Random. Single nodes are deleted using the Delete button, the entire tree is reset by using the Clear button.

  • Insert: Enter integer number, then click Insert. If not already in the tree, the node is inserted.
  • Search: Enter integer number, then click Search. If found, the node is colored green. Else a message is displayed.
  • Delete: Enter integer number, then click Delete. If found, the node is deleted. Else a message is displayed.
  • Random: Creates a tree from random numers in the range 1 .. 50.
  • Clear: Deletes the data structure and clear the output.

BSTLearner: usage
How to run the Notebook

The GUI is shown in an output cell at the end of the Jupyter Notebook. In order to run it in Google Colab, save a copy of the existing Notebook BSTLearne1.0.ipynb in your own Google Colab and run it there.

BSTLearner usage: Video
How to create, modify and search trees


2 BSTLearner Implementation

    Top

The BSTLearner 1.2 implementation consists of five classes stored in two Python files in the folder algolibs, bst.py and avl.py, as depicted in the UML diagram, plus a Jupyter Notebook that will contain the visualization with Jupyter Notebook widgets. Since the classes from the folder algolibs will be imported in other Python files and Jupyter Notebooks, we add an additional empty file __init__.py, which basically declares the folder to be a module.

In the first version BSTLearner 1.0 all classes were defined and tested within the Jupyter Notebook. This makes for a self-contained Notebook, but the increasing complexity of the code requires to move some of the classes in an user-defined Python module "algolibs" and finish the implementation in an integrated development environment such as Spyder or VSCode. This was done for BSTLearner 1.2. For deploying and using a self-defined Python module / package in Google Colab, the following steps must be obeserved:

  • 1. Create a folder algolibs in your Google Drive and upload the files bst.py, avl.py and __init__.py there.
  • 2. In the Jupyter Notebook BSTLearner1.2.ipynb, mount the Google Drive and add your Drive to the system path by inserting a cell with the following code lines:
    from google.colab import drive
    drive.mount('/content/drive')
    import sys
    sys.path.append('/content/drive/MyDrive')
    
  • 3. Import the required classes as follows:
    from algolibs.bst import BST, BSTViz
    from algolibs.avl import AVLTree, AVLViz

Now you are ready to use the classes from the module algolibs!

BSTLearner implementation
Class descriptions
  • TreeNode: The nodes of a binary search tree are created using the class TreeNode. This class has attributes for handling the data structure (pointer to parent node, left and right child) and also some attributes required for the visualization.
  • BST: The class BST contains the methods for creating, modifying and traversing binary search trees. This class has an attribute root that stores the root of the binary search tree. When inserting a new node, the pointers to left or right child and parent node are set accordingly. The attribute list of the class stores list representations of the tree that are obtained from the tree traversals.
  • BSTViz: The Graphviz visualization of a binary search tree bst is created using the method visualize() from the class BSTViz.
  • AVLTree: The class AVLTree is a subclass of BST and inherits its attributes and methods. The methods implementing the tree operations (insert, search, delete) are overriden by methods with the same name that use the super-class methods and add the required code for balancing a tree.
  • AVLViz: The Graphviz visualization of an AVL tree avl is created using the method visualize() from the class AVLViz.
UML Diagram
Package algolibs
BSTLearner UML Diagram

The class usage is tested in the Python file BSTLearnerTest.py. For example, the required code for inserting the keys 1, 2, 3, 4, 5 in an AVL tree and visualize the tree, is:

from algolibs.avl import AVLTree, AVLViz
avl = AVLTree()
print("Insert 1, 2, 3, 4, 5")
avl.insert_keys([1, 2, 3, 4, 5])
AVLViz(avl).visualize()


Before you start, the following tools must be installed on your computer: Python, Anaconda (with Jupyter Notebook), Graphviz. Graphviz is open source graph visualization software, that describes graphs in a simple text language ("DOT"), and creates graph and tree visualizations in a number of image formats and as PDF.

2-1 Required packages

For creating an interactive GUI in Jupyter Notebook we need the classes and methods of the ipywidgets-package, as well as graphviz and pydot for the tree visualization. The packages can be installed as usually using pip or conda commands, for example pip install graphviz or conda install graphviz.

import numpy as np
from numpy.random import seed, randint
import ipywidgets as widgets 
from ipywidgets import interact, interactive_output, HBox, VBox, Layout
from IPython.display import display, clear_output, SVG, HTML
import graphviz as gv # for visualizing a tree using Digraph
from graphviz import Digraph
import pydot


2-2 Class TreeNode

    Top

First, the implementation requires a class TreeNode that implements a node of the tree and has four attributes:

  • key: the key for the node, which in our case is an integer number
  • parent: a pointer to the parent of a node
  • left: a pointer to the left child of a node
  • right: a pointer to the right child of a node

Additional class attributes (pos, col and h) will be used for the Graphviz visualization with the Neato engine.

A binary search tree is created by linking nodes through the left and right pointers either to a left or to a right child. A node carries data, these are stored in the key attribute. The parent pointer is useful when deleting nodes. The method __init__ initializes the class on construction, the methods to_string and print are used for testing.


TreeNode Class
bst.py
class TreeNode:
  def __init__(self, key):
    self.key = key
    self.parent = None
    self.left = None
    self.right = None
    self.pos = [0, 0] # Position
    self.col = "black" # Color
    self.h = 0 # Height
  def spos(self):
    ''' Position of a node for neato '''
    return str(self.pos[0]) + "," + str(self.pos[1]) + "!"
  def to_string(self):
    return str(self.key)
  def print(self):        
    print("TreeNode: %d" % (self.key))
TreeNode Test
BSTLearnerTest.py

In this test we create 3 nodes n10, n5 and n14 for the keys 10, 5 and 14 and set n10 to be the root, n5 its left child and n14 its right child.

from algolibs.bst import TreeNode
n10 = TreeNode(10)
n5 = TreeNode(5)
n14 = TreeNode(14)
# Create tree by linking the tree nodes
n5.parent = n10; n14.parent = n10
n10.left = n5 # left child
n10.right = n14 # right child
print([n10.to_string(), n10.left.to_string(), n10.right.to_string()])


2-3 Class BST

    Top

The class BST implements the methods to create, modify and traverse a binary search tree as data structure. We maintain two member variables: the root of the tree and a list containing the list representation of its elements.

  • insert(k) inserts a node with key k in the tree.
  • search(x, k) searches for the node with key k in the subtree with root x.
  • find(k) searches for the node with key k in the entire tree.
  • delete(k) deletes the node with key k in the tree.
  • transplant(self, u, v) replaces subtree u with subtree v and is needed as helper method when deleting nodes.

The implementation of the methods uses the pseudocode from Cormens "Introduction to Algorithms" [1]. Some of the methods are implemented recursively, for example, the search for a key or the tree traversals.

BST (Class Definition)
bst.py
class BST():
    def __init__(self):
        self.root = None # Root of the tree
        self.list = [] 
    def insert(self, k):
        ''' Insert a key '''
    
    def insert_keys(self, keys): 
        ''' Insert a set of keys'''
        for k in keys:
            self.insert(k)
       
    def search(self, x, k):
        ''' Search for key k in subtree x '''
      
    def find(self, k):  
        return self.search(self.root, k)   
    
    def transplant(self, u, v):
        ''' Replace subtree u with subtree v '''
             
    def delete(self, k):
        '''Delete node with key k from  tree '''     

    def height(self, x):
        ''' Height of the subtree with root x '''
BST (Usage test)
BSTLearnerTest.py
from algolibs.bst import BST, BSTViz

bst = BST()
print("Insert 10, 3, 5, 2, 16")
bst.insert_keys([10, 3, 5, 2, 16])

n3 = bst.find(3)
print("Found: " + n3.to_string())
BSTViz(bst).visualize()
n16 = bst.find(16)
print("Found: " + n16.to_string())
BSTViz(bst).visualize()
BST Insert (Test Output)
BST insert usage


BST: Tree Search
As illustration, we show the implementation of the TREE-SEARCH pseudocode given in Cormen [1] in Python. A recursive search for key k starting at a node x can be implemented using the following pseudocode: If the starting node x is null, we have reached a leaf and are done. If the search k equals the key of node x, we have found the node we searched for, and again, done. If the search key k is less than the key of node x, we continue recursively in the left subtree of x, else continue recursively in the right subtree of x.

TREE-SEARCH (Pseudocode)
Finding a key recursively

TREE-SEARCH(x, k)
if x == NULL or k == x.key
    return x
if k < x.key 
    return TREE-SEARCH(x.left, k) 
else 
    return TREE-SEARCH(x.right, k)
Tree Search (Python)
bst.py
def search(self, x, k):
    '''Search for a key k starting at node x'''
    if (x == None or x.key == k):
        return x
    if (k < x.key):
        return self.search(x.left, k)
    else: 
       return self.search(x.right, k)
def find(self, k):  
    return self.search(self.root, k)  


BST: Tree Rotations
Tree rotations are local operations in a binary tree that are used to balance a tree. For example, when inserting the keys 10, 30, 20 in a binary search tree, the resulting tree is unbalanced: the left subtree has height 0, the right subtree height 2. This imbalance can be corrected by first performing a right rotation on node 30 and then a left rotation on node 10.

Tree rotation

Rotate_left (Function definition)
bst.py
def rotate_left(self, x):
  ''' Rotate left on node x'''
  y = x.right # set y to be right child
  # turn y's left subtree to x's right subtree
  x.right = y.left
  if y.left != None:
      y.left.parent = x
  y.parent = x.parent # link x's parent to y
  if x.parent == None: # Case 1: x is root
      self.root = y
  elif x == x.parent.left: # Case 2: x is left child
      x.parent.left = y
  else: # Case 3: x is right child
      x.parent.right = y
  y.left = x
  x.parent = y
Rotate_left (Test)
BSTLearnerTest.py
from algolibs.bst import BST, BSTViz
bst = BST()
print("Insert 10, 30, 20")
bst.insert_keys([10, 30, 20])
BSTViz(bst).visualize()
print("Rotate right on 30")
bst.rotate_right(bst.find(30))
BSTViz(bst).visualize()
print("Rotate left on 10")
bst.rotate_left(bst.find(10))
BSTViz(bst).visualize()   


2-4 Class BSTViz: Visualization with Graphviz

The class BSTViz extends BST with a Graphviz visualisation. Its only method visualize() creates a visualization of the tree recursively, using the Digraph class from the graphviz package [3]. We use the neato engine of Graphviz so as to be able to position the nodes explicitely to the left and right of their respective parent node. The position of a node has been defined as an attribute of the TreeNode class and is calculated from the position of its parent. In order to place a node to the left of its parent we subtract - h/4, where h is the height of the node. In order to place a node to the right of its parent, we add h/4, where h is the height of the node.

class BSTViz: 
  def __init__(self, bst):
      self.bst = bst
  def visualize(self):   
      ''' Visualize the tree using graphviz '''
      tree = self.bst.root # start with root of the tree 
      dot = Digraph()
      dot.engine = 'neato'
      # Place root node at position tree.pos 
      h = self.bst.height(tree)
      tree.pos = [0, h]          
      dot.node(name=str(tree), label=str(tree.key), color = tree.col, shape="circle", 
               fixedsize="True", width="0.4", pos=tree.spos())
      # Recursively place the other nodes and edges              
      def add_nodes(tree, dot):            
          if tree.left: # if left subtree: position node to left of parent            
              tree.left.pos[0] = tree.pos[0] - h/4 # x
              tree.left.pos[1] = tree.pos[1] - 0.6 # y       
              dot.node(name=str(tree.left), label=str(tree.left.key), color = tree.left.col, 
                       shape="circle", fixedsize="True", width="0.4", pos=tree.left.spos())
              dot.edge(str(tree), str(tree.left))
              dot = add_nodes(tree.left, dot=dot) 
          if tree.right: # if right subtree: position node to right of parent                    
              tree.right.pos[0] = tree.pos[0] + h/4 # x
              tree.right.pos[1] = tree.pos[1] - 0.6 # y                
              dot.node(name=str(tree.right), label=str(tree.right.key), color = tree.right.col, 
                       shape="circle", fixedsize="True", width="0.4", pos=tree.right.spos())
              dot.edge(str(tree), str(tree.right))
              dot = add_nodes(tree.right, dot=dot)                     
          return dot 
      display(add_nodes(tree, dot))  


2-5 Class AVLTree

    Top

Binary search trees are most useful when balanced, because only for a balanced tree the operations (search, insert, delete) have a time complexity of O(log n). The first type of self-balancing binary search trees is the AVL tree, named after its inventors Adelson-Velsky and Landis. An AVL tree uses a figure called balance = height (tree.right) - height (tree.left) to make sure that the heights of two child subtrees of any node differ by at most one. Insertion and deletions are first performed using the usual binary search tree methods. Then, the balance is calculated for each node, and if the balance is less than -1 or greater than 1, the tree is rebalanced using tree rotations.

Class AVLTree is defined as a subclass of class BST, thus inherits the variables and methods of the base class and extends them.

  • insert(k) inserts a node with key k in the tree in a way that the balance of the tree is kept.
  • search() searches a node in the tree. The search is the same as in a regular binary search tree, thus the method of the base class can be reused.
  • delete(k) deletes the node with key k in the tree in a way that the balance of the tree is kept.

AVLTree (Class Definition)
avl.py

First a key is inserted as in a regular binary search tree. Then, the balance indicator b of the node is calculated as difference between the height of the right subtree and the height of the right subtree and according to the value of b the balance of the tree is restored by performing a left rotation, a right rotation, a right-left-rotation or a left-right rotation.

class AVLTree(BST): 
  def __init__(self):
    super().__init__()
  
  def insert(self, k):
    ''' Insert a key '''
    super().insert(k)
    p = super().find(k) 
    prevBalance = 0
    while (p != None):
        b = super().height(p.right) - super().height(p.left)
        if (b > 1 and prevBalance > 0): # Case 1
            super().rotate_left(p)
        elif (b < 1 and prevBalance < 0): # Case 2
            super().rotate_right(p)
        elif (b > 1 and prevBalance < 0): #  Case 3
            super().rotate_right(p.right)
            super().rotate_left(p)
        elif (b < -1 and prevBalance > 0): # Case 4             
            super().rotate_left(p.left)
            super().rotate_right(p)
        prevBalance = b
        p = p.parent
AVLTree (Test)
BSTLearnerTest.py

We create a new AVL tree by inserting a list of keys, visualize it, then delete key 1 and visualize it again. The tree has been balanced nicely.

avl = AVLTree()
print("Insert 1, 2, 3, 4, 5")
avl.insert_keys([1, 2, 3, 4, 5])
AVLViz(avl).visualize()

print("Delete 1")
avl.delete(1)
AVLViz(avl).visualize()
AVLTree (Test Output)
AVL tree


3 Testing the implementation

    Top

The implementation is followed by tests, in which we create different binary search trees and perform the implemented operations on it. We first test the AVLTree class without using the interactive visualization with widgets, by creating a new tree from random numbers, inserting /deleting random nodes and visualizing the tree after each step.

AVL Insert, find, delete
BSTLearnerTest.py
    
from numpy.random import randint
from algolibs.avl import AVLTree, AVLViz
avl = AVLTree()
# Insert 7 random keys
keys = randint(1, 50, 10)
avl.insert_keys(keys)
# Visualize tree
AVLViz(avl).visualize()
print("Enter key:")  
x = int(input()) 
# Find key
nx = avl.find(x)
if nx != None:
    print("Found: " + nx.to_string())
    avl.delete(x) # Delete key
    AVLViz(avl).visualize()
else:
    print("Not found: %d" % (x))
AVL Insert, find, delete (Test output)
AVL insert usage


4 Interactive GUI with Jupyter Widgets

    Top

The final step is to create an interactive GUI with Jupyter Widgets. Each type of tree (BST, AVL, B-Trees) will be placed in its own tab. Each tab will have an edit box where the keys can be entered manually, a display area for the tree and buttons for the different operations (insert key, generate tree from random numbers, delete key, reset).

(1) Create container widgets
Firstly, create the tabs and output widgets that will contain the other controls: input fields, buttons and tree visualization. You can display the empty GUI structure by using display(tab), just to check that everyting looks fine.

import ipywidgets as widgets
from ipywidgets import Tab, HBox, VBox, Output
from ipywidgets import BoundedIntText, Button, HTML

# Create tabs and output objects
out, out1, out2, out3 = Output(), Output(), Output(), Output() 
tab = Tab(children = [out1, out2, out3], 
                  layout=Layout(width='100%', height='auto'))
tab.set_title(0, 'Binary search tree')
tab.set_title(1, 'AVL Tree')
tab.set_title(2, 'B-Tree')
with out1: 
  htm = HTML("TODO: Insert your BST code here")
  display(htm)  
with out2:
  htm = HTML("TODO: Insert your AVL code here")
  display(htm)  
with out3:
  htm = HTML("TODO: Insert your B-Tree code here")
  display(htm)  
display(tab)

The graphical user interface created with these containers should look something similar as in the screenshot. In the space indicated by the placeholder "TODO" we will position the widgets for each type of tree, for the first tab, that will be a Text widget for entering keys, some buttons for performing actions as needed, and the output containing the Graphviz tree.



(2) Create buttons and event handler
For each button we create an eventhandler for the click-event that implements the actions to be performed after clicking that button. At this point, we need an input field where we can enter the keys to be inserted in the tree or deleted from the tree. The input field ui_key is created using the BoundedIntText-widget. We also need to create a new tree instance using the BST-class, this is our underlying data structure that we access and modify through the insert / delete / find etc. methods.

The eventhandler on_button_insert_clicked first clears the output, then calls the insert-method from the BST-class with the argument ui_key.value from the input field, then creates a graphviz dot file using the visualize-method from the BST-class and finally display the tree.

# Create new tree
tree = BST()
# Input field for keys
ui_key = BoundedIntText(value=20, min=0, max=100, 
                        step=2, description='Enter Key:')
# Create button
btn_insert = Button(description='Insert', button_style='success')
def on_button_insert_clicked(b):
  with out:
    clear_output()
    tree.insert(ui_key.value)
    BSTViz(tree).visualize()      
btn_insert.on_click(on_button_insert_clicked)


(3) Place widgets on first tab
Finally, we put everything together: the output out1 for the first tab will be a VBox containing four other widgets: the HBox displ containing the tree visualization out, a HTML-widget msgbox for instructions and error messages and finally a HBox-widget ctrl grouping the buttons.

# Layout for display and controls
layout_displ=Layout(height='350px', border='1px dotted blue', overflow ='auto')
layout_ctrl=Layout(height='50px')
with out1: 
    msgbox = HTML(" ")
    displ = HBox([out], layout=layout_displ)
    ctrl = HBox([ui_key, btn_insert, btn_delete, btn_search], layout=layout_ctrl)
    display(VBox([displ, msgbox, ctrl]))
with out2:
    msgbox = HTML(" ")
    displ = HBox([out], layout=layout_displ)
    ctrl = HBox([ui_key, btn_insert_avl, btn_delete_avl], layout=layout_ctrl)
    display(VBox([displ, msgbox, ctrl]))

After adding more buttons for the different tree operations, the first tab looks as displayed and the onclick-events for the buttons should work.

A minor inconvenience with a more elaborated Jupyter Widgets GUI is that it is shown at the end of the Jupyter Notebook, after rather lengthy class implementations and widgets definitions. To this, there are two solutions: 1. Create a package, put the helper classes there, and import them as you would do with other Python packages. 2. Hide the code cells (this can be done easily in Google Colab), so that only the output of the widget code cell is shown.



Tools & References

    Top

For re-creating this visualization, the following tools must be installed on your computer: Python, Anaconda (with Jupyter Notebook), Graphviz. Graphviz is open source graph visualization software, that describes graphs in a simple text language ("DOT"), and creates graph and tree visualizations in a number of image formats and as PDF.

Or, you can skip the local installations and simply develop and run the Juypter Notebook as Google Colab script in the Cloud.