Python | Reemplace los valores de NaN con el promedio de las columnas

En el aprendizaje automático y el análisis de datos, la visualización de datos es uno de los pasos más importantes. La limpieza y organización de los datos se realiza mediante diferentes algoritmos. A veces, en conjuntos de datos, obtenemos valores NaN (no un número) que no son posibles de usar para la visualización de datos.

Para resolver este problema, un método posible es reemplazar los valores de nan con un promedio de columnas. A continuación se presentan algunos métodos para resolver este problema.

Método #1: Usar np.colmeanynp.take

# Python code to demonstrate
# to replace nan values
# with an average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan], 
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# column mean
col_mean = np.nanmean(ini_array, axis = 0)
  
# printing column mean
print ("columns mean", str(col_mean))
  
# find indices where nan value is present
inds = np.where(np.isnan(ini_array))
  
# replace inds with avg of column
ini_array[inds] = np.take(col_mean, inds[1])
  
# printing final array
print ("final array", ini_array)

Producción:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
columns mean [ 2.   3.   4.5  6. ]

final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Método #2: Usar np.maynp.where

# Python code to demonstrate
# to replace nan values
# with average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# replace nan with col means
res = np.where(np.isnan(ini_array), np.ma.array(ini_array,
               mask = np.isnan(ini_array)).mean(axis = 0), ini_array)   
  
# printing final array
print ("final array", res)

Producción:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Método #3: Usando Naive yzip

# Python code to demonstrate
# to replace nan values
# with average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# indices where values is nan in array
indices = np.where(np.isnan(ini_array))
  
# Iterating over numpy array to replace nan with values
for row, col in zip(*indices):
    ini_array[row, col] = np.mean(ini_array[
           ~np.isnan(ini_array[:, col]), col])
  
# printing final array
print ("final array", ini_array)

Producción:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Publicación traducida automáticamente

Artículo escrito por garg_ak0109 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Deja una respuesta Cancelar la respuesta