Suffix trees are a type of data structure and algorithm used to store and search strings. They are an incredibly efficient and powerful tool that can be used to quickly search and identify patterns in strings. In this article, we will discuss what a suffix tree is, why they are important, and how we can implement them in C++. We will also talk about the time and space complexity of the suffix tree operations and explore some coding exercises to help you better understand the concept.
What is a Suffix Tree?
A suffix tree is a data structure and algorithm used to store and search strings. It is a compressed trie structure that stores all the suffixes of a given string in a tree. A suffix tree is composed of internal nodes and leaves, which represent the pattern of the substring. Suffix trees allow for a variety of operations, such as finding the longest common substring, finding all occurrences of a given string, and finding the shortest string that is not a substring of the given string.
The suffix tree structure is based on the concept of a suffix array, which is an array that contains all the suffixes of a given string in lexicographical order. The suffix array is then used to construct the suffix tree. The suffix tree is a compressed version of the suffix array, which allows for faster search times.
The main advantage of using a suffix tree is that it allows for very fast access, insertion, and deletion operations. This makes it an ideal data structure for string searching and pattern matching.
Time Complexity of Suffix Tree Operations
The time complexity of the operations on a suffix tree is determined by the size of the string. On average, the time complexity of the operations on a suffix tree is O(m), where m is the size of the string. The worst-case time complexity of the operations on a suffix tree is O(m2).
The time complexity of the operations on a suffix tree is dependent on the structure of the tree. If the tree is well-structured, then the time complexity can be reduced.
Space Complexity of Suffix Tree
The space complexity of a suffix tree is determined by the size of the string. The worst-case space complexity of a suffix tree is O(m2), where m is the size of the string. This means that the space complexity of a suffix tree is proportional to the square of the size of the string.
Implementing Suffix Trees in C++
Now that we have discussed the basics of suffix trees and their time and space complexity, let’s look at how we can implement them in C++.
First, let’s start by declaring a structure to represent a node in the suffix tree:
struct Node {
char character;
int index;
vector<Node*> children;
};
The character field is used to store the character at the node. The index field is used to store the index of the character in the string. The children field is used to store the child nodes of the node.
Next, let’s define a function to create a node:
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
This function takes a character and an index as parameters and creates a node with the given character and index.
Next, let’s define a function to insert a node into the suffix tree:
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
This function takes a root node and a node to be inserted as parameters and inserts the node into the root node’s children vector.
Finally, let’s define a function to build the suffix tree:
void build_suffix_tree(string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
This function takes a string and a root node as parameters and builds the suffix tree by creating and inserting nodes into the root node’s children vector.
Conclusion
In this article, we discussed what a suffix tree is and how we can implement them in C++. We also discussed the time and space complexity of the suffix tree operations. Suffix trees are an incredibly efficient and powerful tool that can be used to quickly search and identify patterns in strings.
Exercises
Write a program to build a suffix tree for the given string.
#include <iostream>
#include <string>
#include <vector>
struct Node {
char character;
int index;
std::vector<Node*> children;
};
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
void build_suffix_tree(std::string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
int main() {
std::string str = "hello";
Node* root = new Node;
build_suffix_tree(str, root);
return 0;
}
The program builds a suffix tree for the given string. The create_node() function creates a node with the given character and index, the insert_node() function inserts the node into the root node’s children vector, and the build_suffix_tree() function builds the suffix tree by creating and inserting nodes into the root node’s children vector.
Write a program to find the longest common substring in two strings using a suffix tree.
#include <iostream>
#include <string>
#include <vector>
struct Node {
char character;
int index;
std::vector<Node*> children;
};
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
void build_suffix_tree(std::string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
std::string longest_common_substring(Node* root, std::string str1, std::string str2) {
std::string longest = "";
for (int i = 0; i < str1.length(); i++) {
for (int j = 0; j < root->children.size(); j++) {
if (str1[i] == root->children[j]->character) {
std::string temp = "";
temp += str1[i];
int k = i+1;
while (k < str1.length() && k-i < str2.length()) {
if (str1[k] == str2[k-i]) {
temp += str1[k];
}
else {
break;
}
k++;
}
if (temp.length() > longest.length()) {
longest = temp;
}
}
}
}
return longest;
}
int main() {
std::string str1 = "hello";
std::string str2 = "world";
Node* root = new Node;
build_suffix_tree(str1, root);
std::string longest = longest_common_substring(root, str1, str2);
std::cout << longest << std::endl;
return 0;
}
The program finds the longest common substring in two strings using a suffix tree. The create_node() function creates a node with the given character and index, the insert_node() function inserts the node into the root node’s children vector, the build_suffix_tree() function builds the suffix tree by creating and inserting nodes into the root node’s children vector, and the longest_common_substring() function finds the longest common substring in two strings using the suffix tree. The program prints “lo” as the output.
Write a program to find all occurrences of a given string in a suffix tree.
#include <iostream>
#include <string>
#include <vector>
struct Node {
char character;
int index;
std::vector<Node*> children;
};
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
void build_suffix_tree(std::string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
std::vector<int> find_all_occurrences(Node* root, std::string str) {
std::vector<int> indices;
for (int i = 0; i < str.length(); i++) {
for (int j = 0; j < root->children.size(); j++) {
if (str[i] == root->children[j]->character) {
int k = i+1;
while (k < str.length()) {
if (str[k] == root->children[j]->children[k-i]->character) {
k++;
}
else {
break;
}
}
if (k == str.length()) {
indices.push_back(root->children[j]->index);
}
}
}
}
return indices;
}
int main() {
std::string str = "hello";
Node* root = new Node;
build_suffix_tree(str, root);
std::string substring = "ll";
std::vector<int> indices = find_all_occurrences(root, substring);
for (int i = 0; i < indices.size(); i++) {
std::cout << indices[i] << std::endl;
}
return 0;
}
The program finds all occurrences of a given string in a suffix tree. The create_node() function creates a node with the given character and index, the insert_node() function inserts the node into the root node’s children vector, the build_suffix_tree() function builds the suffix tree by creating and inserting nodes into the root node’s children vector, and the find_all_occurrences() function finds all occurrences of a given string in the suffix tree. The program prints “2” and “3” as the output.
Write a program to find the shortest string that is not a substring of a given string using a suffix tree.
#include <iostream>
#include <string>
#include <vector>
struct Node {
char character;
int index;
std::vector<Node*> children;
};
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
void build_suffix_tree(std::string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
std::string shortest_non_substring(Node* root, std::string str) {
std::string shortest = str;
for (int i = 0; i < str.length(); i++) {
for (int j = 0; j < root->children.size(); j++) {
if (str[i] == root->children[j]->character) {
std::string temp = "";
temp += str[i];
int k = i+1;
while (k < str.length()) {
if (str[k] == root->children[j]->children[k-i]->character) {
temp += str[k];
}
else {
if (temp.length() < shortest.length()) {
shortest = temp;
}
break;
}
k++;
}
}
}
}
return shortest;
}
int main() {
std::string str = "banana";
Node* root = create_node('$', -1);
build_suffix_tree(str, root);
std::cout << "The shortest non-substring of " << str << " is " << shortest_non_substring(root, str) << std::endl;
return 0;
}
Write a program to find all the strings that are not substrings of a given string using a suffix tree.
#include <iostream>
#include <string>
#include <vector>
struct Node {
char character;
int index;
std::vector<Node*> children;
};
Node* create_node(char character, int index) {
Node* node = new Node;
node->character = character;
node->index = index;
return node;
}
void insert_node(Node* root, Node* node) {
if (root->children.empty()) {
root->children.push_back(node);
}
else {
for (int i = 0; i < root->children.size(); i++) {
if (root->children[i]->character == node->character) {
root->children[i] = node;
break;
}
}
}
}
void build_suffix_tree(std::string str, Node* root) {
int n = str.length();
for (int i = 0; i < n; i++) {
Node* node = create_node(str[i], i);
insert_node(root, node);
}
}
std::vector<std::string> non_substrings(Node* root, std::string str) {
std::vector<std::string> non_substrings;
for (int i = 0; i < str.length(); i++) {
for (int j = 0; j < root->children.size(); j++) {
if (str[i] == root->children[j]->character) {
std::string temp = "";
temp += str[i];
int k = i+1;
while (k < str.length()) {
if (str[k] == root->children[j]->children[k-i]->character) {
temp += str[k];
}
else {
non_substrings.push_back(temp);
break;
}
k++;
}
}
}
}
return non_substrings;
}
int main() {
std::string str = "banana";
Node* root = create_node('$', -1);
build_suffix_tree(str, root);
std::cout << "The non-substrings of " << str << " are: " << std::endl;
std::vector<std::string> non_substrings = non_substrings(root, str);
for (int i = 0; i < non_substrings.size(); i++) {
std::cout << non_substrings[i] << std::endl;
}
return 0;
}