Compare commits

...

16 Commits

8 changed files with 355 additions and 165 deletions

0
docs/.gitkeep Normal file
View File

143
docs/Summary.org Normal file
View File

@@ -0,0 +1,143 @@
#+TITLE: Práctica 2
#+SUBTITLE: Metaheurísticas
#+AUTHOR: Amin Kasrou Aouam
#+DATE: 2021-06-22
#+PANDOC_OPTIONS: template:~/.pandoc/templates/eisvogel.latex
#+PANDOC_OPTIONS: listings:t
#+PANDOC_OPTIONS: toc:t
#+PANDOC_METADATA: lang=es
#+PANDOC_METADATA: titlepage:t
#+PANDOC_METADATA: listings-no-page-break:t
#+PANDOC_METADATA: toc-own-page:t
#+PANDOC_METADATA: table-use-row-colors:t
#+PANDOC_METADATA: colorlinks:t
#+PANDOC_METADATA: logo:/home/coolneng/Photos/Logos/UGR.png
#+LaTeX_HEADER: \usepackage[ruled, lined, linesnumbered, commentsnumbered, longend]{algorithm2e}
* Práctica 2
** Introducción
En esta práctica, usaremos distintos algoritmos de búsqueda, basados en poblaciones, para resolver el problema de la máxima diversidad (MDP). Implementaremos:
- Algoritmo genético
- Algoritmo memético
** Algoritmos
*** Genético
Los algoritmos genéticos se inspiran en la evolución natural y la genética. Generan un conjunto de soluciones inicial (i.e. población), seleccionan un subconjunto de individuos sobre los cuales se opera, hacen operaciones de recombinación y mutación, y finalmente reemplazan la población anterior por una nueva.
El procedimiento general del algoritmo queda ilustrado a continuación:
\begin{algorithm}
\KwIn{A list $[a_i]$, $i=1, 2, \cdots, n$, that contains the population of individuals}
\KwOut{Processed list}
$P(t) \leftarrow initializePopulation()$
$P(t) \leftarrow evaluatePopulation()$
\While{$\neg stop condition $}{
$t = t + 1$
$parents \leftarrow selectParents(P(t-1))$
$offspring \leftarrow recombine(parents)$
$offspring \leftarrow mutate(offspring)$
$P(t) \leftarrow replacePopulation(P(t-1), offspring)$
$P(t) \leftarrow evaluatePopulation()$
}
\KwRet{$P(t)$}
\end{algorithm}
Procedemos a la implementación de 4 variantes distintas, según 2 criterios:
**** Criterio de reemplazamiento
- *Generacional*: la nueva población reemplaza totalmente a la población anterior
- *Estacionario*: los dos mejores hijos reemplazan los dos peores individuos en la población anterior
**** Operador de cruce
- *Uniforme*: mantiene las posiciones comunes de ambos padres, las demás se eligen de forma aleatoria de cada padre (requiere reparador)
- *Posición*: mantiene las posiciones comunes de ambos padres, elige el resto de elementos de cada padre y los baraja. Genera 2 hijos.
*** Memético
Los algoritmos meméticos surgen de la hibridación de un algoritmo genético, con un algoritmo de búsqueda local. El resultado es un algoritmo que posee un buen equilibrio entre exploración y explotación.
El procedimiento general del algoritmo queda ilustrado a continuación:
\begin{algorithm}
\KwIn{A list $[a_i]$, $i=1, 2, \cdots, n$, that contains the population of individuals}
\KwOut{Processed list}
$P(t) \leftarrow initializePopulation()$
$P(t) \leftarrow evaluatePopulation()$
\While{$\neg stop condition $}{
\If{$certain iteration$}{
$P(t) <- localSearch(P(t-1))$
}
$t = t + 1$
$parents \leftarrow selectParents(P(t-1))$
$offspring \leftarrow recombine(parents)$
$offspring \leftarrow mutate(offspring)$
$P(t) \leftarrow replacePopulation(P(t-1), offspring)$
$P(t) \leftarrow evaluatePopulation()$
}
\KwRet{$P(t)$}
\end{algorithm}
Procedemos a la implementación de 3 variantes distintas:
- Búsqueda local sobre todos los cromosomas
- Búsqueda local sobre un subconjunto aleatorio de cromosomas
- Búsqueda local sobre un el subconjunto de los mejores cromosomas
** Implementación
La práctica ha sido implementada en /Python/, usando las siguientes bibliotecas:
- NumPy
- Pandas
*** Instalación
Para ejecutar el programa es preciso instalar Python, junto con las bibliotecas *Pandas* y *NumPy*.
Se proporciona el archivo shell.nix para facilitar la instalación de las dependencias, con el gestor de paquetes [[https://nixos.org/][Nix]]. Tras instalar la herramienta Nix, únicamente habría que ejecutar el siguiente comando en la raíz del proyecto:
#+begin_src shell
nix-shell
#+end_src
** Ejecución
La ejecución del programa se realiza mediante el siguiente comando:
#+begin_src shell
python src/main.py <dataset> <algoritmo> <parámetros>
#+end_src
Los parámetros posibles son:
| dataset | algoritmo | parámetros |
| Cualquier archivo de la carpeta data | genetic | uniform/position generation/stationary |
| | memetic | all/random/best |
También se proporciona un script que ejecuta 1 iteración de cada algoritmo, sobre cada uno de los /datasets/, y guarda los resultados en una hoja de cálculo. Se puede ejecutar mediante el siguiente comando:
#+begin_src shell
python src/execution.py
#+end_src
*Nota*: se precisa instalar la biblioteca [[https://xlsxwriter.readthedocs.io/][XlsxWriter]] para la exportación de los resultados a un archivo Excel.
* Análisis de los resultados
Desafortunadamente, debido a un tiempo de ejecución excesivamente alto (incluso tras ajustar los metaparámetros) no podemos proporcionar resultados de la ejecución de los algoritmos.

BIN
docs/Summary.pdf Normal file

Binary file not shown.

View File

@@ -14,16 +14,13 @@ def file_list(path):
def create_dataframes():
greedy = DataFrame()
local = DataFrame()
return greedy, local
return [DataFrame() for _ in range(7)]
def process_output(results):
distances = []
time = []
for element in results:
for line in element:
for line in results:
if line.startswith(bytes("Total distance:", encoding="utf-8")):
line_elements = line.split(sep=bytes(":", encoding="utf-8"))
distances.append(float(line_elements[1]))
@@ -33,51 +30,51 @@ def process_output(results):
return distances, time
def populate_dataframes(greedy, local, greedy_list, local_list, dataset):
greedy_distances, greedy_time = process_output(greedy_list)
local_distances, local_time = process_output(local_list)
greedy_dict = {
def populate_dataframe(df, output_cmd, dataset):
distances, time = process_output(output_cmd)
data_dict = {
"dataset": dataset.removeprefix("data/"),
"media distancia": mean(greedy_distances),
"desviacion distancia": std(greedy_distances),
"media tiempo": mean(greedy_time),
"desviacion tiempo": std(greedy_time),
"media distancia": mean(distances),
"desviacion distancia": std(distances),
"media tiempo": mean(time),
"desviacion tiempo": std(time),
}
local_dict = {
"dataset": dataset.removeprefix("data/"),
"media distancia": mean(local_distances),
"desviacion distancia": std(local_distances),
"media tiempo": mean(local_time),
"desviacion tiempo": std(local_time),
}
greedy = greedy.append(greedy_dict, ignore_index=True)
local = local.append(local_dict, ignore_index=True)
return greedy, local
df = df.append(data_dict, ignore_index=True)
return df
def script_execution(filenames, greedy, local, iterations=3):
def script_execution(filenames, df_list):
script = "src/main.py"
parameters = [
["genetic", "uniform", "generational"],
["genetic", "position", "generational"],
["genetic", "uniform", "stationary"],
["genetic", "position", "stationary"],
["memetic", "all"],
["memetic", "random"],
["memetic", "best"],
]
for dataset in filenames:
print(f"Running on dataset {dataset}")
greedy_list = []
local_list = []
for _ in range(iterations):
greedy_cmd = run(
[executable, script, dataset, "greedy"], capture_output=True
for index, params in zip(range(4), parameters):
print(f"Running {params} algorithm")
output_cmd = run(
[executable, script, dataset, *params], capture_output=True
).stdout.splitlines()
local_cmd = run(
[executable, script, dataset, "local"], capture_output=True
).stdout.splitlines()
greedy_list.append(greedy_cmd)
local_list.append(local_cmd)
greedy, local = populate_dataframes(
greedy, local, greedy_list, local_list, dataset
)
return greedy, local
df_list[index] = populate_dataframe(df_list[index], output_cmd, dataset)
return df_list
def export_results(greedy, local):
dataframes = {"Greedy": greedy, "Local search": local}
def export_results(df_list):
dataframes = {
"Generational uniform genetic": df_list[0],
"Generational position genetic": df_list[1],
"Stationary uniform genetic": df_list[2],
"Stationary position genetic": df_list[3],
"All genes memetic": df_list[4],
"Random genes memetic": df_list[5],
"Best genes memetic": df_list[6],
}
writer = ExcelWriter(path="docs/algorithm-results.xlsx", engine="xlsxwriter")
for name, df in dataframes.items():
df.to_excel(writer, sheet_name=name, index=False)
@@ -91,9 +88,9 @@ def export_results(greedy, local):
def main():
datasets = file_list(path="data/*.txt")
greedy, local = create_dataframes()
populated_greedy, populated_local = script_execution(datasets, greedy, local)
export_results(populated_greedy, populated_local)
df_list = create_dataframes()
populated_df_list = script_execution(datasets, df_list)
export_results(populated_df_list)
if __name__ == "__main__":

View File

@@ -1,12 +1,11 @@
from numpy import sum, append, intersect1d, array_equal
from numpy import intersect1d, array_equal
from numpy.random import randint, choice, shuffle
from pandas import DataFrame
from math import ceil
from functools import partial
from multiprocessing import Pool
from copy import deepcopy
from preprocessing import parse_file
from itertools import combinations
def get_row_distance(source, destination, data):
@@ -37,22 +36,23 @@ def generate_individual(n, m, data):
def evaluate_individual(individual, data):
fitness = []
genotype = individual.point.values
distances = data.query(f"source in @genotype and destination in @genotype")
for item in genotype[:-1]:
element_df = distances.query(f"source == {item} or destination == {item}")
max_distance = element_df["distance"].astype(float).max()
fitness = append(arr=fitness, values=max_distance)
distances = distances.query(f"source != {item} and destination != {item}")
individual["fitness"] = sum(fitness)
fitness = 0
comb = combinations(individual.index, r=2)
for index in list(comb):
elements = individual.loc[index, :]
fitness += get_row_distance(
source=elements["point"].head(n=1).values[0],
destination=elements["point"].tail(n=1).values[0],
data=data,
)
individual["fitness"] = fitness
return individual
def select_distinct_genes(matching_genes, parents, m):
first_parent = parents[0].query("point not in @matching_genes")
second_parent = parents[1].query("point not in @matching_genes")
cutoff = randint(m - len(matching_genes))
cutoff = randint(m - len(matching_genes) + 1)
first_parent_genes = first_parent.point.values[cutoff:]
second_parent_genes = second_parent.point.values[:cutoff]
return first_parent_genes, second_parent_genes
@@ -139,9 +139,8 @@ def group_parents(parents):
first = parents[i]
second = parents[i + 1]
if array_equal(first.point.values, second.point.values):
tmp = second
second = parents[i - 2]
parents[i - 2] = tmp
random_index = randint(i + 1)
second, parents[random_index] = parents[random_index], second
parent_pairs.append([first, second])
return parent_pairs
@@ -178,7 +177,7 @@ def select_new_gene(individual, n):
return new_gene
def mutate(offspring, data, probability=0.001):
def mutate(offspring, n, data, probability=0.001):
expected_mutations = len(offspring) * n * probability
individuals = []
genes = []
@@ -297,19 +296,8 @@ def genetic_algorithm(n, m, data, select_mode, crossover_mode, max_iterations=10
for _ in range(max_iterations):
parents = select_parents(population, n, select_mode)
offspring = crossover(crossover_mode, parents, m)
offspring = mutate(offspring, data)
offspring = mutate(offspring, n, data)
population = replace_population(population, offspring, select_mode)
population = evaluate_population(population, data)
best_index, _ = get_best_elements(population)
return population[best_index]
n, m, data = parse_file("data/GKD-c_11_n500_m50.txt")
genetic_algorithm(
n=10,
m=4,
data=data,
select_mode="generational",
crossover_mode="uniform",
max_iterations=10,
)

64
src/local_search.py Normal file
View File

@@ -0,0 +1,64 @@
from numpy.random import choice, seed, randint
from pandas import DataFrame
def get_row_distance(source, destination, data):
row = data.query(
"""(source == @source and destination == @destination) or \
(source == @destination and destination == @source)"""
)
return row["distance"].values[0]
def compute_distance(element, solution, data):
accumulator = 0
distinct_elements = solution.query(f"point != {element}")
for _, item in distinct_elements.iterrows():
accumulator += get_row_distance(
source=element,
destination=item.point,
data=data,
)
return accumulator
def element_in_dataframe(solution, element):
duplicates = solution.query(f"point == {element}")
return not duplicates.empty
def replace_worst_element(previous, n, data):
solution = previous.copy()
worst_index = solution["distance"].astype(float).idxmin()
random_element = randint(n)
while element_in_dataframe(solution=solution, element=random_element):
random_element = randint(n)
solution["point"].loc[worst_index] = random_element
solution["distance"].loc[worst_index] = compute_distance(
element=solution["point"].loc[worst_index], solution=solution, data=data
)
return solution
def get_random_solution(previous, n, data):
solution = replace_worst_element(previous, n, data)
while solution["distance"].sum() <= previous["distance"].sum():
solution = replace_worst_element(previous=solution, n=n, data=data)
return solution
def explore_neighbourhood(element, n, data, max_iterations=100000):
neighbourhood = []
neighbourhood.append(element)
for _ in range(max_iterations):
previous_solution = neighbourhood[-1]
neighbour = get_random_solution(previous=previous_solution, n=n, data=data)
neighbourhood.append(neighbour)
return neighbour
def local_search(first_solution, n, data):
best_solution = explore_neighbourhood(
element=first_solution, n=n, data=data, max_iterations=5
)
return best_solution

View File

@@ -1,68 +1,57 @@
from preprocessing import parse_file
from genetic_algorithm import genetic_algorithm
from memetic_algorithm import memetic_algorithm
from sys import argv
from time import time
from itertools import combinations
from argparse import ArgumentParser
def execute_algorithm(choice, n, m, data):
if choice == "genetic":
return genetic_algorithm(n, m, data)
elif choice == "memetic":
return memetic_algorithm(m, data)
else:
print("The valid algorithm choices are 'genetic' and 'memetic'")
exit(1)
def get_row_distance(source, destination, data):
row = data.query(
"""(source == @source and destination == @destination) or \
(source == @destination and destination == @source)"""
def execute_algorithm(args, n, m, data):
if args.algorithm == "genetic":
return genetic_algorithm(
n,
m,
data,
select_mode=args.selection,
crossover_mode=args.crossover,
max_iterations=100,
)
return row["distance"].values[0]
def get_fitness(solutions, data):
counter = 0
comb = combinations(solutions.index, r=2)
for index in list(comb):
elements = solutions.loc[index, :]
counter += get_row_distance(
source=elements["point"].head(n=1).values[0],
destination=elements["point"].tail(n=1).values[0],
data=data,
return memetic_algorithm(
n,
m,
data,
hybridation=args.hybridation,
max_iterations=100,
)
return counter
def show_results(solutions, fitness, time_delta):
duplicates = solutions.duplicated().any()
print(solutions)
print(f"Total distance: {fitness}")
def show_results(solution, time_delta):
duplicates = solution.duplicated().any()
print(solution)
print(f"Total distance: {solution.fitness.values[0]}")
if not duplicates:
print("No duplicates found")
print(f"Execution time: {time_delta}")
def usage(argv):
print(f"Usage: python {argv[0]} <file> <algorithm choice>")
print("algorithm choices:")
print("genetic: genetic algorithm")
print("memetic: memetic algorithm")
exit(1)
def parse_arguments():
parser = ArgumentParser()
parser.add_argument("file", help="dataset of choice")
subparsers = parser.add_subparsers(dest="algorithm")
parser_genetic = subparsers.add_parser("genetic")
parser_memetic = subparsers.add_parser("memetic")
parser_genetic.add_argument("crossover", choices=["uniform", "position"])
parser_genetic.add_argument("selection", choices=["generational", "stationary"])
parser_memetic.add_argument("hybridation", choices=["all", "random", "best"])
return parser.parse_args()
def main():
if len(argv) != 3:
usage(argv)
n, m, data = parse_file(argv[1])
args = parse_arguments()
n, m, data = parse_file(args.file)
start_time = time()
solutions = execute_algorithm(choice=argv[2], n=n, m=m, data=data)
solutions = execute_algorithm(args, n, m, data)
end_time = time()
fitness = get_fitness(solutions, data)
show_results(solutions, fitness, time_delta=end_time - start_time)
show_results(solutions, time_delta=end_time - start_time)
if __name__ == "__main__":

View File

@@ -1,50 +1,59 @@
from numpy.random import choice, seed
from genetic_algorithm import *
from local_search import local_search
from copy import deepcopy
def get_first_random_solution(m, data):
seed(42)
random_indexes = choice(len(data.index), size=m, replace=False)
return data.loc[random_indexes]
def get_best_indices(n, population):
select_population = deepcopy(population)
best_elements = []
for _ in range(n):
best_index, _ = get_best_elements(select_population)
best_elements.append(best_index)
select_population.pop(best_index)
return best_elements
def element_in_dataframe(solution, element):
duplicates = solution.query(
f"(source == {element.source} and destination == {element.destination}) or (source == {element.destination} and destination == {element.source})"
)
return not duplicates.empty
def replace_elements(current_population, new_population, indices):
for item in indices:
current_population[item] = new_population[item]
return current_population
def replace_worst_element(previous, data):
solution = previous.copy()
worst_index = solution["distance"].astype(float).idxmin()
random_element = data.sample().squeeze()
while element_in_dataframe(solution=solution, element=random_element):
random_element = data.sample().squeeze()
solution.loc[worst_index] = random_element
return solution, worst_index
def get_random_solution(previous, data):
solution, worst_index = replace_worst_element(previous, data)
previous_worst_distance = previous["distance"].loc[worst_index]
while solution.distance.loc[worst_index] <= previous_worst_distance:
solution, _ = replace_worst_element(previous=solution, data=data)
return solution
def explore_neighbourhood(element, data, max_iterations=100000):
def run_local_search(n, data, population, mode, probability=0.1):
neighbourhood = []
neighbourhood.append(element)
for _ in range(max_iterations):
previous_solution = neighbourhood[-1]
neighbour = get_random_solution(previous=previous_solution, data=data)
neighbourhood.append(neighbour)
return neighbour
if mode == "all":
for individual in population:
neighbourhood.append(local_search(individual, n, data))
new_population = neighbourhood
elif mode == "random":
expected_individuals = len(population) * probability
indices = []
for _ in range(expected_individuals):
random_index = randint(len(population))
random_individual = population[random_index]
neighbourhood.append(local_search(random_individual, n, data))
indices.append(random_index)
new_population = replace_elements(population, neighbourhood, indices)
else:
expected_individuals = len(population) * probability
best_indices = get_best_indices(n=expected_individuals, population=population)
for element in best_indices:
neighbourhood.append(local_search(population[element], n, data))
new_population = replace_elements(population, neighbourhood, best_indices)
return new_population
def memetic_algorithm(m, data):
first_solution = get_first_random_solution(m=m, data=data)
best_solution = explore_neighbourhood(
element=first_solution, data=data, max_iterations=100
)
return best_solution
def memetic_algorithm(n, m, data, hybridation, max_iterations=100000):
population = [generate_individual(n, m, data) for _ in range(n)]
population = evaluate_population(population, data)
for i in range(max_iterations):
if i % 10 == 0:
population = run_local_search(n, data, population, mode=hybridation)
i += 5
parents = select_parents(population, n, mode="stationary")
offspring = crossover(mode="position", parents=parents, m=m)
offspring = mutate(offspring, n, data)
population = replace_population(population, offspring, mode="stationary")
population = evaluate_population(population, data)
best_index, _ = get_best_elements(population)
return population[best_index]