Array de sufijos | Conjunto 2 (algoritmo nLogn)

Dada una string , la tarea es construir una array de sufijos para la string dada.

Una array de sufijos es una array ordenada de todos los sufijos de una string dada. La definición es similar a Suffix Tree , que se comprime de todos los sufijos del texto dado.

Ejemplos:

Entrada: str = “banana”
Salida: {5, 3, 1, 0, 4, 2}
Explicación:
Sufijo por índice Sufijo ordenado alfabéticamente
———————– ———————————— —–
0 banana 5 a
1 anana Ordenar los sufijos 3 ana
2 nana —————– —> 1 anana
3 ana alfabéticamente 0 banana
4 na 4 na
5 a 2 nana
Así que la array de sufijos para “banana” es {5, 3, 1, 0, 4, 2}

Entrada: str = “geeksforgeeks”
Salida: {10 9 2 1 5 8 0 11 3 6 7 12 4}
Explicación:
0 geeksforgeeks 10 eks
1 eeksforgeeks 9 eeks
2 eksforgeeks 2 eksforgeeks
3 ksforgeeks 1 eeksforgeeks
4 sforgeeks 5
sforgeeks
6 orgeeks ——————> 0 geeksforgeeks
7 rgeeks 11 ks
8 geeks 3 ksforgeeks
9 eeks 6 orgeeks
10 eks 7 rgeeks
11 ks 12 s
12 s 4 sforgeeks La
array de sufijos para “geeksforgeeks” es {10 9 2 1 5 8 0 11 3 6 7 12 4 }

Enfoque ingenuo: hemos discutido el algoritmo ingenuo para la construcción de una array de sufijos . El algoritmo Naive es considerar todos los sufijos, clasificarlos usando el algoritmo de clasificación O(n Log n) y mientras se clasifican, mantener los índices originales.

Complejidad de tiempo: O(n ² log(n)), donde n es el número de caracteres en la string de entrada.

Enfoque optimizado: en esta publicación, se analiza el algoritmo O (n Log n) para la construcción de arrays de sufijos. Analicemos primero un algoritmo O(n * Logn * Logn) por simplicidad.

La idea es utilizar el hecho de que las strings que se van a ordenar son sufijos de una sola string.

Primero clasificamos todos los sufijos según el primer carácter, luego según los primeros 2 caracteres, luego los primeros 4 caracteres, y así sucesivamente hasta que el número de caracteres a considerar sea menor que 2n.
El punto importante es que, si hemos ordenado los sufijos de acuerdo con los primeros 2 ⁱ caracteres, entonces podemos ordenar los sufijos de acuerdo con los primeros 2 ^{i + 1} caracteres en el tiempo O (n Log n) usando un algoritmo de clasificación (n Log n) como Merge Sort .
Esto es posible ya que se pueden comparar dos sufijos en tiempo O(1) (necesitamos comparar solo dos valores, consulte el ejemplo y el código a continuación).

La función de clasificación se llama O(Logn) veces (Tenga en cuenta que aumentamos el número de caracteres a considerar en potencias de 2). Por lo tanto, la complejidad del tiempo total se convierte en O(nLognLogn).

Construyamos una array de sufijos para la string de ejemplo «banana» usando el algoritmo anterior.

Ordenar según los dos primeros caracteres Asigne un rango a todos los sufijos utilizando el valor ASCII del primer carácter. Una forma sencilla de asignar rango es hacer «str[i] – ‘a'» para el i-ésimo sufijo de strp[]

Index     Suffix            Rank
 0        banana             1   
 1        anana              0 
 2        nana               13 
 3        ana                0
 4        na                 13
 5        a                  0

Para cada carácter, también almacenamos el rango del siguiente carácter adyacente, es decir, el rango del carácter en str[i + 1] (Esto es necesario para ordenar los sufijos según los primeros 2 caracteres). Si un carácter es el último carácter, almacenamos el siguiente rango como -1

Index    Suffix            Rank          Next Rank 
 0       banana             1              0
 1       anana              0              13    
 2       nana               13             0
 3       ana                0              13
 4       na                 13             0 
 5       a                  0             -1

Ordene todos los sufijos según el rango y el rango adyacente. El rango se considera como el primer dígito o MSD, y el rango adyacente se considera como el segundo dígito.

Index    Suffix            Rank          Next Rank 
 5        a                  0              -1
 1        anana              0               13    
 3        ana                0               13
 0        banana             1               0
 2        nana               13              0
 4        na                 13              0

Ordenar según los primeros cuatro caracteres
Asignar nuevos rangos a todos los sufijos. Para asignar nuevos rangos, consideramos los sufijos ordenados uno por uno. Asigne 0 como nuevo rango al primer sufijo. Para asignar rangos a los sufijos restantes, consideramos un par de rangos de sufijos justo antes del sufijo actual. Si el par de rango anterior de un sufijo es el mismo que el rango anterior del sufijo justo antes de él, asígnele el mismo rango. De lo contrario, asigne un rango del sufijo anterior más uno.

Index       Suffix          Rank       
  5          a               0     [Assign 0 to first]        
  1          anana           1     (0, 13) is different from previous
  3          ana             1     (0, 13) is same as previous     
  0          banana          2     (1, 0) is different from previous      
  2          nana            3     (13, 0) is different from previous      
  4          na              3     (13, 0) is same as previous

Para cada sufijo str[i], también almacene el rango del siguiente sufijo en str[i + 2]. Si no hay un sufijo siguiente en i + 2, almacenamos el siguiente rango como -1

Index       Suffix          Rank        Next Rank
  5          a               0             -1
  1          anana           1              1      
  3          ana             1              0 
  0          banana          2              3
  2          nana            3              3 
  4          na              3              -1

Ordene todos los sufijos según el rango y el siguiente rango.

Index       Suffix          Rank        Next Rank
  5          a               0             -1
  3          ana             1              0 
  1          anana           1              1      
  0          banana          2              3
  4          na              3             -1
  2          nana            3              3

C++

// C++ program for building suffix array of a given text
#include <iostream>
#include <cstring>
#include <algorithm>
using namespace std;
 
// Structure to store information of a suffix
struct suffix
{
    int index; // To store original index
    int rank[2]; // To store ranks and next rank pair
};
 
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
    return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
               (a.rank[0] < b.rank[0] ?1: 0);
}
 
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
int *buildSuffixArray(char *txt, int n)
{
    // A structure to store suffixes and their indexes
    struct suffix suffixes[n];
 
    // Store suffixes and their indexes in an array of structures.
    // The structure is needed to sort the suffixes alphabetically
    // and maintain their old indexes while sorting
    for (int i = 0; i < n; i++)
    {
        suffixes[i].index = i;
        suffixes[i].rank[0] = txt[i] - 'a';
        suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a'): -1;
    }
 
    // Sort the suffixes using the comparison function
    // defined above.
    sort(suffixes, suffixes+n, cmp);
 
    // At this point, all suffixes are sorted according to first
    // 2 characters.  Let us sort suffixes according to first 4
    // characters, then first 8 and so on
    int ind[n];  // This array is needed to get the index in suffixes[]
                 // from original index.  This mapping is needed to get
                 // next suffix.
    for (int k = 4; k < 2*n; k = k*2)
    {
        // Assigning rank and index values to first suffix
        int rank = 0;
        int prev_rank = suffixes[0].rank[0];
        suffixes[0].rank[0] = rank;
        ind[suffixes[0].index] = 0;
 
        // Assigning rank to suffixes
        for (int i = 1; i < n; i++)
        {
            // If first rank and next ranks are same as that of previous
            // suffix in array, assign the same new rank to this suffix
            if (suffixes[i].rank[0] == prev_rank &&
                    suffixes[i].rank[1] == suffixes[i-1].rank[1])
            {
                prev_rank = suffixes[i].rank[0];
                suffixes[i].rank[0] = rank;
            }
            else // Otherwise increment rank and assign
            {
                prev_rank = suffixes[i].rank[0];
                suffixes[i].rank[0] = ++rank;
            }
            ind[suffixes[i].index] = i;
        }
 
        // Assign next rank to every suffix
        for (int i = 0; i < n; i++)
        {
            int nextindex = suffixes[i].index + k/2;
            suffixes[i].rank[1] = (nextindex < n)?
                                  suffixes[ind[nextindex]].rank[0]: -1;
        }
 
        // Sort the suffixes according to first k characters
        sort(suffixes, suffixes+n, cmp);
    }
 
    // Store indexes of all sorted suffixes in the suffix array
    int *suffixArr = new int[n];
    for (int i = 0; i < n; i++)
        suffixArr[i] = suffixes[i].index;
 
    // Return the suffix array
    return  suffixArr;
}
 
// A utility function to print an array of given size
void printArr(int arr[], int n)
{
    for (int i = 0; i < n; i++)
        cout << arr[i] << " ";
    cout << endl;
}
 
// Driver program to test above functions
int main()
{
    char txt[] = "banana";
    int n = strlen(txt);
    int *suffixArr = buildSuffixArray(txt,  n);
    cout << "Following is suffix array for " << txt << endl;
    printArr(suffixArr, n);
    return 0;
}

Java

// Java program for building suffix array of a given text
import java.util.*;
class GFG
{
    // Class to store information of a suffix
    public static class Suffix implements Comparable<Suffix>
    {
        int index;
        int rank;
        int next;
 
        public Suffix(int ind, int r, int nr)
        {
            index = ind;
            rank = r;
            next = nr;
        }
         
        // A comparison function used by sort()
        // to compare two suffixes.
        // Compares two pairs, returns 1
        // if first pair is smaller
        public int compareTo(Suffix s)
        {
            if (rank != s.rank) return Integer.compare(rank, s.rank);
            return Integer.compare(next, s.next);
        }
    }
     
    // This is the main function that takes a string 'txt'
    // of size n as an argument, builds and return the
    // suffix array for the given string
    public static int[] suffixArray(String s)
    {
        int n = s.length();
        Suffix[] su = new Suffix[n];
         
        // Store suffixes and their indexes in
        // an array of classes. The class is needed
        // to sort the suffixes alphabetically and
        // maintain their old indexes while sorting
        for (int i = 0; i < n; i++)
        {
            su[i] = new Suffix(i, s.charAt(i) - '$', 0);
        }
        for (int i = 0; i < n; i++)
            su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
 
        // Sort the suffixes using the comparison function
        // defined above.
        Arrays.sort(su);
 
        // At this point, all suffixes are sorted
        // according to first 2 characters.
        // Let us sort suffixes according to first 4
        // characters, then first 8 and so on
        int[] ind = new int[n];
         
        // This array is needed to get the index in suffixes[]
        // from original index. This mapping is needed to get
        // next suffix.
        for (int length = 4; length < 2 * n; length <<= 1)
        {
             
            // Assigning rank and index values to first suffix
            int rank = 0, prev = su[0].rank;
            su[0].rank = rank;
            ind[su[0].index] = 0;
            for (int i = 1; i < n; i++)
            {
                // If first rank and next ranks are same as
                // that of previous suffix in array,
                // assign the same new rank to this suffix
                if (su[i].rank == prev &&
                    su[i].next == su[i - 1].next)
                {
                    prev = su[i].rank;
                    su[i].rank = rank;
                }
                else
                {
                    // Otherwise increment rank and assign
                    prev = su[i].rank;
                    su[i].rank = ++rank;
                }
                ind[su[i].index] = i;
            }
             
            // Assign next rank to every suffix
            for (int i = 0; i < n; i++)
            {
                int nextP = su[i].index + length / 2;
                su[i].next = nextP < n ?
                   su[ind[nextP]].rank : -1;
            }
             
            // Sort the suffixes according
            // to first k characters
            Arrays.sort(su);
        }
 
        // Store indexes of all sorted
        // suffixes in the suffix array
        int[] suf = new int[n];
         
        for (int i = 0; i < n; i++)
            suf[i] = su[i].index;
 
        // Return the suffix array
        return suf;
    }   
     
    static void printArr(int arr[], int n)
    {
        for (int i = 0; i < n; i++)
            System.out.print(arr[i] + " ");
        System.out.println();
    }
     
    // Driver Code
    public static void main(String[] args)
    {
        String txt = "banana";
        int n = txt.length();
        int[] suff_arr = suffixArray(txt);
        System.out.println("Following is suffix array for banana:");
        printArr(suff_arr, n);
    }
}
 
// This code is contributed by AmanKumarSingh

Python3

# Python3 program for building suffix
# array of a given text
 
# Class to store information of a suffix
class suffix:
     
    def __init__(self):
         
        self.index = 0
        self.rank = [0, 0]
 
# This is the main function that takes a
# string 'txt' of size n as an argument,
# builds and return the suffix array for
# the given string
def buildSuffixArray(txt, n):
     
    # A structure to store suffixes
    # and their indexes
    suffixes = [suffix() for _ in range(n)]
 
    # Store suffixes and their indexes in
    # an array of structures. The structure
    # is needed to sort the suffixes alphabetically
    # and maintain their old indexes while sorting
    for i in range(n):
        suffixes[i].index = i
        suffixes[i].rank[0] = (ord(txt[i]) -
                               ord("a"))
        suffixes[i].rank[1] = (ord(txt[i + 1]) -
                        ord("a")) if ((i + 1) < n) else -1
 
    # Sort the suffixes according to the rank
    # and next rank
    suffixes = sorted(
        suffixes, key = lambda x: (
            x.rank[0], x.rank[1]))
 
    # At this point, all suffixes are sorted
    # according to first 2 characters.  Let
    # us sort suffixes according to first 4
    # characters, then first 8 and so on
    ind = [0] * n  # This array is needed to get the
                   # index in suffixes[] from original
                   # index.This mapping is needed to get
                   # next suffix.
    k = 4
    while (k < 2 * n):
         
        # Assigning rank and index
        # values to first suffix
        rank = 0
        prev_rank = suffixes[0].rank[0]
        suffixes[0].rank[0] = rank
        ind[suffixes[0].index] = 0
 
        # Assigning rank to suffixes
        for i in range(1, n):
             
            # If first rank and next ranks are
            # same as that of previous suffix in
            # array, assign the same new rank to
            # this suffix
            if (suffixes[i].rank[0] == prev_rank and
                suffixes[i].rank[1] == suffixes[i - 1].rank[1]):
                prev_rank = suffixes[i].rank[0]
                suffixes[i].rank[0] = rank
                 
            # Otherwise increment rank and assign   
            else: 
                prev_rank = suffixes[i].rank[0]
                rank += 1
                suffixes[i].rank[0] = rank
            ind[suffixes[i].index] = i
 
        # Assign next rank to every suffix
        for i in range(n):
            nextindex = suffixes[i].index + k // 2
            suffixes[i].rank[1] = suffixes[ind[nextindex]].rank[0] \
                if (nextindex < n) else -1
 
        # Sort the suffixes according to
        # first k characters
        suffixes = sorted(
            suffixes, key = lambda x: (
                x.rank[0], x.rank[1]))
 
        k *= 2
 
    # Store indexes of all sorted
    # suffixes in the suffix array
    suffixArr = [0] * n
     
    for i in range(n):
        suffixArr[i] = suffixes[i].index
 
    # Return the suffix array
    return suffixArr
 
# A utility function to print an array
# of given size
def printArr(arr, n):
     
    for i in range(n):
        print(arr[i], end = " ")
         
    print()
 
# Driver code
if __name__ == "__main__":
     
    txt = "banana"
    n = len(txt)
     
    suffixArr = buildSuffixArray(txt, n)
     
    print("Following is suffix array for", txt)
     
    printArr(suffixArr, n)
 
# This code is contributed by debrc

Javascript

<script>
// Javascript program for building suffix array of a given text
 
// Class to store information of a suffix
class Suffix
{
    constructor(ind,r,nr)
    {
        this.index = ind;
        this.rank = r;
        this.next = nr;
    }
     
}
 
// This is the main function that takes a string 'txt'
    // of size n as an argument, builds and return the
    // suffix array for the given string
function suffixArray(s)
{
    let n = s.length;
        let su = new Array(n);
          
        // Store suffixes and their indexes in
        // an array of classes. The class is needed
        // to sort the suffixes alphabetically and
        // maintain their old indexes while sorting
        for (let i = 0; i < n; i++)
        {
            su[i] = new Suffix(i, s[i].charCodeAt(0) - '$'.charCodeAt(0), 0);
        }
        for (let i = 0; i < n; i++)
            su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
  
        // Sort the suffixes using the comparison function
        // defined above.
        su.sort(function(a,b){
            if(a.rank!=b.rank)
                return a.rank-b.rank;
            else
                return a.next-b.next;
        });
  
        // At this point, all suffixes are sorted
        // according to first 2 characters.
        // Let us sort suffixes according to first 4
        // characters, then first 8 and so on
        let ind = new Array(n);
          
        // This array is needed to get the index in suffixes[]
        // from original index. This mapping is needed to get
        // next suffix.
        for (let length = 4; length < 2 * n; length <<= 1)
        {
              
            // Assigning rank and index values to first suffix
            let rank = 0, prev = su[0].rank;
            su[0].rank = rank;
            ind[su[0].index] = 0;
            for (let i = 1; i < n; i++)
            {
                // If first rank and next ranks are same as
                // that of previous suffix in array,
                // assign the same new rank to this suffix
                if (su[i].rank == prev &&
                    su[i].next == su[i - 1].next)
                {
                    prev = su[i].rank;
                    su[i].rank = rank;
                }
                else
                {
                    // Otherwise increment rank and assign
                    prev = su[i].rank;
                    su[i].rank = ++rank;
                }
                ind[su[i].index] = i;
            }
              
            // Assign next rank to every suffix
            for (let i = 0; i < n; i++)
            {
                let nextP = su[i].index + length / 2;
                su[i].next = nextP < n ?
                   su[ind[nextP]].rank : -1;
            }
              
            // Sort the suffixes according
            // to first k characters
            su.sort(function(a,b){
            if(a.rank!=b.rank)
                return a.rank-b.rank;
            else
                return a.next-b.next;
        });
        }
  
        // Store indexes of all sorted
        // suffixes in the suffix array
        let suf = new Array(n);
          
        for (let i = 0; i < n; i++)
            suf[i] = su[i].index;
  
        // Return the suffix array
        return suf;
}
 
function printArr(arr,n)
{
    for (let i = 0; i < n; i++)
            document.write(arr[i] + " ");
        document.write();
}
 
// Driver Code
let txt = "banana";
let n = txt.length;
let suff_arr = suffixArray(txt);
document.write("Following is suffix array for banana:<br>");
printArr(suff_arr, n);
 
// This code is contributed by patel2127
</script>

Producción

Following is suffix array for banana
5 3 1 0 4 2

Tenga en cuenta que el algoritmo anterior utiliza la función de clasificación estándar y, por lo tanto, la complejidad del tiempo es O (n Log (n) Log (n)). Podemos usar Radix Sort aquí para reducir la complejidad del tiempo a O (n Log n).

Espacio Auxiliar: O(n)

Método 2: El problema también se puede resolver usando el mapa .

Algoritmo:

Cree un mapa con una string clave y su valor sea un número entero.
Iterar sobre la string en orden inverso y crear una nueva string (es decir, desde i = n – 1, 0).
Asigne una nueva string con la última posición de índice de I.
Cree una array y asigne todos los valores del mapa en la array.

C++14

// C++14 program to build a suffix array in O(nlogn) time;
 
#include <bits/stdc++.h>
using namespace std;
 
int main()
{
    string s = "banana";
    int n = s.length();
    map<string, int> Map;
    int suffix[n];
 
    // Mapping string with its index of
    // it's last letter.
    string sub = "";
    for (int i = n - 1; i >= 0; i--) {
        sub = s[i] + sub;
        Map[sub] = i;
    }
 
    // Storing all values of map
    // in suffix array.
    int j = 0;
    for (auto x : Map) {
        suffix[j] = x.second;
        j++;
    }
 
    // printing suffix array.
    cout << "Suffix array for banana is" << endl;
    for (int i = 0; i < n; i++) {
        cout << suffix[i] << " ";
    }
    cout << endl;
    return 0;
}

Producción

Suffix array for banana is
5 3 1 0 4 2

Complejidad de tiempo: La complejidad de tiempo del algoritmo es O(N ² + Nlog(N)).

Espacio Auxiliar: O(n)

Tenga en cuenta que las arrays de sufijos también se pueden construir en tiempo O (n). Pronto discutiremos los algoritmos O(n).

Escriba comentarios si encuentra algo incorrecto o si desea compartir más información sobre el tema tratado anteriormente.

Publicación traducida automáticamente

Artículo escrito por GeeksforGeeks-1 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

C++

Java

Python3

Javascript

C++14

Deja una respuesta Cancelar la respuesta