Python PySpark – Unión y UnionAll

En este artículo, discutiremos Union y UnionAll en PySpark en Python.

Unión en PySpark

La función PySpark union() se usa para combinar dos o más marcos de datos que tienen la misma estructura o esquema. Esta función devuelve un error si el esquema de los marcos de datos difiere entre sí.

Sintaxis:

marco de datos1.union(marco de datos2)

Aquí,

dataFrame1 y dataFrame2 son los marcos de datos

Ejemplo 1:

En este ejemplo, hemos combinado dos marcos de datos, data_frame1 y data_frame2. Tenga en cuenta que el esquema de ambos marcos de datos es el mismo.

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a dataframe
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another dataframe
data_frame2 = spark.createDataFrame(
    [("Naveen", 91.123), ("Piyush", 90.51)],
    ["Student Name", "Overall Percentage"]
)
  
# union()
answer = data_frame1.union(data_frame2)
  
# Print the result of the union()
answer.show()

Producción:

Ejemplo 2:

En este ejemplo, hemos combinado dos marcos de datos, data_frame1 y data_frame2. Tenga en cuenta que el esquema de ambos marcos de datos es diferente. Por lo tanto, la salida no es la deseada ya que la función union() es ideal para conjuntos de datos que tienen la misma estructura o esquema.

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using union() function
answer = data_frame1.union(data_frame2)
  
# Print the union of both the dataframes
answer.show()

Producción:

UnionAll() en PySpark

La función UnionAll() hace la misma tarea que la función union(), pero esta función está obsoleta desde la versión «2.0.0» de Spark. Por lo tanto, se recomienda la función union().

Sintaxis:

dataFrame1.unionAll(dataFrame2)

Aquí,

dataFrame1 y dataFrame2 son los marcos de datos

Ejemplo 1:

En este ejemplo, hemos combinado dos marcos de datos, data_frame1 y data_frame2. Tenga en cuenta que el esquema de ambos marcos de datos es el mismo.

Python3

# Python program to illustrate the
# working of unionAll() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a dataframe
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another dataframe
data_frame2 = spark.createDataFrame(
    [("Naveen", 91.123), ("Piyush", 90.51)],
    ["Student Name", "Overall Percentage"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

Producción:

Ejemplo 2:

En este ejemplo, hemos combinado dos marcos de datos, data_frame1 y data_frame2. Tenga en cuenta que el esquema de ambos marcos de datos es diferente. Por lo tanto, la salida no es la deseada ya que la función unionAll() es ideal para conjuntos de datos que tienen la misma estructura o esquema.

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

Producción:

Publicación traducida automáticamente

Artículo escrito por bhuwanesh y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Unión en PySpark

Python3

Python3

UnionAll() en PySpark

Python3

Python3

Deja una respuesta Cancelar la respuesta