Problema de la supercuerda más corta | Conjunto 2 (Usando Set Cover)

Dado un conjunto de n strings S, encuentre la string más pequeña que contiene cada string en el conjunto dado como substring. Podemos suponer que ninguna string en arr[] es una substring de otra string.

Ejemplos:

Input:  S = {"001", "01101", "010"}
Output: 0011010  

Input:  S = {"geeks", "quiz", "for"}
Output: geeksquizfor

Input:  S = {"catg", "ctaagt", "gcta", "ttca", "atgcatc"}
Output: gctaagttcatgcatc

En la publicación anterior , hemos discutido una solución que se demuestra que es 4 aproximada (conjeturada como 2 aproximada).
En esta publicación, se discute una solución que se puede probar como 2H _n aproximada. donde Hn ₌ 1 + 1/2 + 1/3 + … 1/n. La idea es transformar el problema de la supercuerda más corta en un problema de cobertura de conjunto (el problema de cobertura de conjunto tiene algunos subconjuntos de un universo y cada subconjunto dado tiene un costo asociado. La tarea es encontrar el conjunto de menor costo de subconjuntos dados de modo que todos los elementos de universo están cubiertos). Para un problema de Set Cover, necesitamos tener un universo y subconjuntos de universo con sus costos asociados.

A continuación se muestran los pasos para transformar Shortest Superstring en Set Cover .

1) Let S be the set of given strings.
   S = {s₁, s₂, ... s_n}

2) Universe for Set Cover problem is S (We need
   to find a superstring that has every string
   as substring)

3) Let us initialize subsets to be considered for universe as
     Subsets =  {{s₁}, {s₂}, ... {s_n}}
   Cost of every subset is length of string in it.

3) For all pairs of strings s_i and s_j in S,
     If s_i and s_j overlap
      a) Construct a string r_ijk where k is
         the maximum overlap between the two.
      b) Add the set represented by r_ijk to Subsets,
           i.e., Subsets = Subsets U Set(r_ijk)
         The set represented by r_ijk is the set 
         of all strings which are substring of it.
         Cost of the subset is length of r_ijk.

4) Now problem is transformed to Set Cover, we can 
   run Greedy Set Cover approximate algorithm to find
   set cover of S using Subsets.  Cost of every element in
   Subsets is length of string in it.

Ejemplo:

S = {s₁, s₂, s₃}.
s₁ = "001"
s₂ = "01101"
s₃ = "010"

[Combination of s₁ and s₂ with 2 overlapping characters]
r₁₂₂ = 001101 

[Combination of s₁ and s₃ with 2 overlapping characters]
r₁₃₂ = 0010 

Similarly,
r₂₃₂ = 011010
r₃₁₁ = 01001
r₃₂₁ = 0101101

Now set cover problem becomes as following:

Universe to cover is {s₁, s₂, s₃}

Subsets of the universe and their costs :

{s₁}, cost 3 (length of s₁)
{s₂}, cost 5 (length of s₂)
{s₃}, cost 5 (length of s₃)

set(r₁₂₂), cost 6 (length of r₁₂₂)
The set r₁₂₂ represents all strings which are
substrings of r₁₂₂. 
Therefore set(r₁₂₂) = {s₁, s₂}

set(r₁₃₂), cost 3 (length of r₁₃₂)
The subset r₁₃₂ represents all strings which are
substrings of r₁₃₂
Therefore set(r₁₃₂) = {s₁, s₃}

Similarly there are more subsets for set(r₂₃₂), 
set(r₃₁₁), and set(r₃₂₁).

So we have a set cover problem with universe and subsets
of universe with costs associated with every subset.

Hemos discutido que una instancia del problema de Superstring más corta se puede transformar en una instancia del problema Set Cover en tiempo polinomial.

Consulte esto como prueba del hecho de que el algoritmo basado en Set Cover es 2H _n aproximado.

Referencia:
http://www.cs.dartmouth.edu/~ac/Teach/CS105-Winter05/Notes/wan-ba-notes.pdf
http://fileadmin.cs.lth.se/cs/Personal/Andrzej_Lingas/ superstring.pdf
http://math.mit.edu/~goemans/18434S06/superstring-lele.pdf

Este artículo es aportado por Dheeraj Gupta . Escriba comentarios si encuentra algo incorrecto o si desea compartir más información sobre el tema tratado anteriormente.

Publicación traducida automáticamente

Artículo escrito por GeeksforGeeks-1 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Deja una respuesta Cancelar la respuesta