Add summary

Execute each algorithm once
Fix element modification in dataframe list
2021-06-22 15:16:46 +02:00 · 2021-06-22 01:47:11 +02:00 · 2021-06-22 01:10:05 +02:00 · 2021-06-22 00:21:14 +02:00 · 2021-06-21 21:41:00 +02:00 · 2021-06-21 19:51:42 +02:00
8 changed files with 355 additions and 165 deletions
--- a/docs/.gitkeep
+++ b/docs/.gitkeep
--- a/docs/Summary.org
+++ b/docs/Summary.org
@@ -0,0 +1,143 @@
 #+TITLE: Práctica 2
 #+SUBTITLE: Metaheurísticas
 #+AUTHOR: Amin Kasrou Aouam
 #+DATE: 2021-06-22
 #+PANDOC_OPTIONS: template:~/.pandoc/templates/eisvogel.latex
 #+PANDOC_OPTIONS: listings:t
 #+PANDOC_OPTIONS: toc:t
 #+PANDOC_METADATA: lang=es
 #+PANDOC_METADATA: titlepage:t
 #+PANDOC_METADATA: listings-no-page-break:t
 #+PANDOC_METADATA: toc-own-page:t
 #+PANDOC_METADATA: table-use-row-colors:t
 #+PANDOC_METADATA: colorlinks:t
 #+PANDOC_METADATA: logo:/home/coolneng/Photos/Logos/UGR.png
 #+LaTeX_HEADER: \usepackage[ruled, lined, linesnumbered, commentsnumbered, longend]{algorithm2e}
 * Práctica 2
 ** Introducción
 En esta práctica, usaremos distintos algoritmos de búsqueda, basados en poblaciones, para resolver el problema de la máxima diversidad (MDP). Implementaremos:
 - Algoritmo genético
 - Algoritmo memético
 ** Algoritmos
 *** Genético
 Los algoritmos genéticos se inspiran en la evolución natural y la genética. Generan un conjunto de soluciones inicial (i.e. población), seleccionan un subconjunto de individuos sobre los cuales se opera, hacen operaciones de recombinación y mutación, y finalmente reemplazan la población anterior por una nueva.
 El procedimiento general del algoritmo queda ilustrado a continuación:
 \begin{algorithm}
    \KwIn{A list $[a_i]$, $i=1, 2, \cdots, n$, that contains the population of individuals}
    \KwOut{Processed list}
    $P(t) \leftarrow initializePopulation()$
    $P(t) \leftarrow evaluatePopulation()$
    \While{$\neg stop  condition $}{
        $t = t + 1$
        $parents \leftarrow selectParents(P(t-1))$
        $offspring \leftarrow recombine(parents)$
        $offspring \leftarrow mutate(offspring)$
        $P(t) \leftarrow replacePopulation(P(t-1), offspring)$
        $P(t) \leftarrow evaluatePopulation()$
    }
    \KwRet{$P(t)$}
 \end{algorithm}
 Procedemos a la implementación de 4 variantes distintas, según 2 criterios:
 **** Criterio de reemplazamiento
 - *Generacional*: la nueva población reemplaza totalmente a la población anterior
 - *Estacionario*: los dos mejores hijos reemplazan los dos peores individuos en la población anterior
 **** Operador de cruce
 - *Uniforme*: mantiene las posiciones comunes de ambos padres, las demás se eligen de forma aleatoria de cada padre (requiere reparador)
 - *Posición*: mantiene las posiciones comunes de ambos padres, elige el resto de elementos de cada padre y los baraja. Genera 2 hijos.
 *** Memético
 Los algoritmos meméticos surgen de la hibridación de un algoritmo genético, con un algoritmo de búsqueda local. El resultado es un algoritmo que posee un buen equilibrio entre exploración y explotación.
 El procedimiento general del algoritmo queda ilustrado a continuación:
 \begin{algorithm}
    \KwIn{A list $[a_i]$, $i=1, 2, \cdots, n$, that contains the population of individuals}
    \KwOut{Processed list}
    $P(t) \leftarrow initializePopulation()$
    $P(t) \leftarrow evaluatePopulation()$
    \While{$\neg stop  condition $}{
        \If{$certain  iteration$}{
            $P(t) <- localSearch(P(t-1))$
        }
        $t = t + 1$
        $parents \leftarrow selectParents(P(t-1))$
        $offspring \leftarrow recombine(parents)$
        $offspring \leftarrow mutate(offspring)$
        $P(t) \leftarrow replacePopulation(P(t-1), offspring)$
        $P(t) \leftarrow evaluatePopulation()$
    }
    \KwRet{$P(t)$}
 \end{algorithm}
 Procedemos a la implementación de 3 variantes distintas:
 - Búsqueda local sobre todos los cromosomas
 - Búsqueda local sobre un subconjunto aleatorio de cromosomas
 - Búsqueda local sobre un el subconjunto de los mejores cromosomas
 ** Implementación
 La práctica ha sido implementada en /Python/, usando las siguientes bibliotecas:
 - NumPy
 - Pandas
 *** Instalación
 Para ejecutar el programa es preciso instalar Python, junto con las bibliotecas *Pandas* y *NumPy*.
 Se proporciona el archivo shell.nix para facilitar la instalación de las dependencias, con el gestor de paquetes [[https://nixos.org/][Nix]]. Tras instalar la herramienta Nix, únicamente habría que ejecutar el siguiente comando en la raíz del proyecto:
 #+begin_src shell
 nix-shell
 #+end_src
 ** Ejecución
 La ejecución del programa se realiza mediante el siguiente comando:
 #+begin_src shell
 python src/main.py <dataset> <algoritmo> <parámetros>
 #+end_src
 Los parámetros posibles son:
 | dataset                              | algoritmo | parámetros                             |
 | Cualquier archivo de la carpeta data | genetic   | uniform/position generation/stationary |
 |                                      | memetic   | all/random/best                        |
 También se proporciona un script que ejecuta 1 iteración de cada algoritmo, sobre cada uno de los /datasets/, y guarda los resultados en una hoja de cálculo. Se puede ejecutar mediante el siguiente comando:
 #+begin_src shell
 python src/execution.py
 #+end_src
 *Nota*: se precisa instalar la biblioteca [[https://xlsxwriter.readthedocs.io/][XlsxWriter]] para la exportación de los resultados a un archivo Excel.
 * Análisis de los resultados
 Desafortunadamente, debido a un tiempo de ejecución excesivamente alto (incluso tras ajustar los metaparámetros) no podemos proporcionar resultados de la ejecución de los algoritmos.
--- a/docs/Summary.pdf
+++ b/docs/Summary.pdf
--- a/src/execution.py
+++ b/src/execution.py
@@ -14,16 +14,13 @@ def file_list(path):
 def create_dataframes():
-    greedy = DataFrame()
+    return [DataFrame() for _ in range(7)]
    local = DataFrame()
    return greedy, local
 def process_output(results):
    distances = []
    time = []
-    for element in results:
+    for line in results:
        for line in element:
        if line.startswith(bytes("Total distance:", encoding="utf-8")):
            line_elements = line.split(sep=bytes(":", encoding="utf-8"))
            distances.append(float(line_elements[1]))
@@ -33,51 +30,51 @@ def process_output(results):
    return distances, time
-def populate_dataframes(greedy, local, greedy_list, local_list, dataset):
+def populate_dataframe(df, output_cmd, dataset):
-    greedy_distances, greedy_time = process_output(greedy_list)
+    distances, time = process_output(output_cmd)
-    local_distances, local_time = process_output(local_list)
+    data_dict = {
    greedy_dict = {
        "dataset": dataset.removeprefix("data/"),
-        "media distancia": mean(greedy_distances),
+        "media distancia": mean(distances),
-        "desviacion distancia": std(greedy_distances),
+        "desviacion distancia": std(distances),
-        "media tiempo": mean(greedy_time),
+        "media tiempo": mean(time),
-        "desviacion tiempo": std(greedy_time),
+        "desviacion tiempo": std(time),
    }
-    local_dict = {
+    df = df.append(data_dict, ignore_index=True)
-        "dataset": dataset.removeprefix("data/"),
+    return df
        "media distancia": mean(local_distances),
        "desviacion distancia": std(local_distances),
        "media tiempo": mean(local_time),
        "desviacion tiempo": std(local_time),
    }
    greedy = greedy.append(greedy_dict, ignore_index=True)
    local = local.append(local_dict, ignore_index=True)
    return greedy, local
-def script_execution(filenames, greedy, local, iterations=3):
+def script_execution(filenames, df_list):
    script = "src/main.py"
    parameters = [
        ["genetic", "uniform", "generational"],
        ["genetic", "position", "generational"],
        ["genetic", "uniform", "stationary"],
        ["genetic", "position", "stationary"],
        ["memetic", "all"],
        ["memetic", "random"],
        ["memetic", "best"],
    ]
    for dataset in filenames:
        print(f"Running on dataset {dataset}")
-        greedy_list = []
+        for index, params in zip(range(4), parameters):
-        local_list = []
+            print(f"Running {params} algorithm")
-        for _ in range(iterations):
+            output_cmd = run(
-            greedy_cmd = run(
+                [executable, script, dataset, *params], capture_output=True
                [executable, script, dataset, "greedy"], capture_output=True
            ).stdout.splitlines()
-            local_cmd = run(
+            df_list[index] = populate_dataframe(df_list[index], output_cmd, dataset)
-                [executable, script, dataset, "local"], capture_output=True
+    return df_list
            ).stdout.splitlines()
            greedy_list.append(greedy_cmd)
            local_list.append(local_cmd)
        greedy, local = populate_dataframes(
            greedy, local, greedy_list, local_list, dataset
        )
    return greedy, local
-def export_results(greedy, local):
+def export_results(df_list):
-    dataframes = {"Greedy": greedy, "Local search": local}
+    dataframes = {
        "Generational uniform genetic": df_list[0],
        "Generational position genetic": df_list[1],
        "Stationary uniform genetic": df_list[2],
        "Stationary position genetic": df_list[3],
        "All genes memetic": df_list[4],
        "Random genes memetic": df_list[5],
        "Best genes memetic": df_list[6],
    }
    writer = ExcelWriter(path="docs/algorithm-results.xlsx", engine="xlsxwriter")
    for name, df in dataframes.items():
        df.to_excel(writer, sheet_name=name, index=False)
@@ -91,9 +88,9 @@ def export_results(greedy, local):
 def main():
    datasets = file_list(path="data/*.txt")
-    greedy, local = create_dataframes()
+    df_list = create_dataframes()
-    populated_greedy, populated_local = script_execution(datasets, greedy, local)
+    populated_df_list = script_execution(datasets, df_list)
-    export_results(populated_greedy, populated_local)
+    export_results(populated_df_list)
 if __name__ == "__main__":
--- a/src/genetic_algorithm.py
+++ b/src/genetic_algorithm.py
@@ -1,12 +1,11 @@
-from numpy import sum, append, intersect1d, array_equal
+from numpy import intersect1d, array_equal
 from numpy.random import randint, choice, shuffle
 from pandas import DataFrame
 from math import ceil
 from functools import partial
 from multiprocessing import Pool
 from copy import deepcopy
-
+from itertools import combinations
 from preprocessing import parse_file
 def get_row_distance(source, destination, data):
@@ -37,22 +36,23 @@ def generate_individual(n, m, data):
 def evaluate_individual(individual, data):
-    fitness = []
+    fitness = 0
-    genotype = individual.point.values
+    comb = combinations(individual.index, r=2)
-    distances = data.query(f"source in @genotype and destination in @genotype")
+    for index in list(comb):
-    for item in genotype[:-1]:
+        elements = individual.loc[index, :]
-        element_df = distances.query(f"source == {item} or destination == {item}")
+        fitness += get_row_distance(
-        max_distance = element_df["distance"].astype(float).max()
+            source=elements["point"].head(n=1).values[0],
-        fitness = append(arr=fitness, values=max_distance)
+            destination=elements["point"].tail(n=1).values[0],
-        distances = distances.query(f"source != {item} and destination != {item}")
+            data=data,
-    individual["fitness"] = sum(fitness)
+        )
    individual["fitness"] = fitness
    return individual
 def select_distinct_genes(matching_genes, parents, m):
    first_parent = parents[0].query("point not in @matching_genes")
    second_parent = parents[1].query("point not in @matching_genes")
-    cutoff = randint(m - len(matching_genes))
+    cutoff = randint(m - len(matching_genes) + 1)
    first_parent_genes = first_parent.point.values[cutoff:]
    second_parent_genes = second_parent.point.values[:cutoff]
    return first_parent_genes, second_parent_genes
@@ -139,9 +139,8 @@ def group_parents(parents):
        first = parents[i]
        second = parents[i + 1]
        if array_equal(first.point.values, second.point.values):
-            tmp = second
+            random_index = randint(i + 1)
-            second = parents[i - 2]
+            second, parents[random_index] = parents[random_index], second
            parents[i - 2] = tmp
        parent_pairs.append([first, second])
    return parent_pairs
@@ -178,7 +177,7 @@ def select_new_gene(individual, n):
            return new_gene
-def mutate(offspring, data, probability=0.001):
+def mutate(offspring, n, data, probability=0.001):
    expected_mutations = len(offspring) * n * probability
    individuals = []
    genes = []
@@ -297,19 +296,8 @@ def genetic_algorithm(n, m, data, select_mode, crossover_mode, max_iterations=10
    for _ in range(max_iterations):
        parents = select_parents(population, n, select_mode)
        offspring = crossover(crossover_mode, parents, m)
-        offspring = mutate(offspring, data)
+        offspring = mutate(offspring, n, data)
        population = replace_population(population, offspring, select_mode)
        population = evaluate_population(population, data)
    best_index, _ = get_best_elements(population)
    return population[best_index]
 n, m, data = parse_file("data/GKD-c_11_n500_m50.txt")
 genetic_algorithm(
    n=10,
    m=4,
    data=data,
    select_mode="generational",
    crossover_mode="uniform",
    max_iterations=10,
 )
--- a/src/local_search.py
+++ b/src/local_search.py
@@ -0,0 +1,64 @@
 from numpy.random import choice, seed, randint
 from pandas import DataFrame
 def get_row_distance(source, destination, data):
    row = data.query(
        """(source == @source and destination == @destination) or \
        (source == @destination and destination == @source)"""
    )
    return row["distance"].values[0]
 def compute_distance(element, solution, data):
    accumulator = 0
    distinct_elements = solution.query(f"point != {element}")
    for _, item in distinct_elements.iterrows():
        accumulator += get_row_distance(
            source=element,
            destination=item.point,
            data=data,
        )
    return accumulator
 def element_in_dataframe(solution, element):
    duplicates = solution.query(f"point == {element}")
    return not duplicates.empty
 def replace_worst_element(previous, n, data):
    solution = previous.copy()
    worst_index = solution["distance"].astype(float).idxmin()
    random_element = randint(n)
    while element_in_dataframe(solution=solution, element=random_element):
        random_element = randint(n)
    solution["point"].loc[worst_index] = random_element
    solution["distance"].loc[worst_index] = compute_distance(
        element=solution["point"].loc[worst_index], solution=solution, data=data
    )
    return solution
 def get_random_solution(previous, n, data):
    solution = replace_worst_element(previous, n, data)
    while solution["distance"].sum() <= previous["distance"].sum():
        solution = replace_worst_element(previous=solution, n=n, data=data)
    return solution
 def explore_neighbourhood(element, n, data, max_iterations=100000):
    neighbourhood = []
    neighbourhood.append(element)
    for _ in range(max_iterations):
        previous_solution = neighbourhood[-1]
        neighbour = get_random_solution(previous=previous_solution, n=n, data=data)
        neighbourhood.append(neighbour)
    return neighbour
 def local_search(first_solution, n, data):
    best_solution = explore_neighbourhood(
        element=first_solution, n=n, data=data, max_iterations=5
    )
    return best_solution
--- a/src/main.py
+++ b/src/main.py
@@ -1,68 +1,57 @@
 from preprocessing import parse_file
 from genetic_algorithm import genetic_algorithm
 from memetic_algorithm import memetic_algorithm
 from sys import argv
 from time import time
-from itertools import combinations
+from argparse import ArgumentParser
-def execute_algorithm(choice, n, m, data):
+def execute_algorithm(args, n, m, data):
-    if choice == "genetic":
+    if args.algorithm == "genetic":
-        return genetic_algorithm(n, m, data)
+        return genetic_algorithm(
-    elif choice == "memetic":
+            n,
-        return memetic_algorithm(m, data)
+            m,
-    else:
+            data,
-        print("The valid algorithm choices are 'genetic' and 'memetic'")
+            select_mode=args.selection,
-        exit(1)
+            crossover_mode=args.crossover,
-
+            max_iterations=100,
 def get_row_distance(source, destination, data):
    row = data.query(
        """(source == @source and destination == @destination) or \
        (source == @destination and destination == @source)"""
        )
-    return row["distance"].values[0]
+    return memetic_algorithm(
-
+        n,
-
+        m,
-def get_fitness(solutions, data):
+        data,
-    counter = 0
+        hybridation=args.hybridation,
-    comb = combinations(solutions.index, r=2)
+        max_iterations=100,
    for index in list(comb):
        elements = solutions.loc[index, :]
        counter += get_row_distance(
            source=elements["point"].head(n=1).values[0],
            destination=elements["point"].tail(n=1).values[0],
            data=data,
    )
    return counter
-def show_results(solutions, fitness, time_delta):
+def show_results(solution, time_delta):
-    duplicates = solutions.duplicated().any()
+    duplicates = solution.duplicated().any()
-    print(solutions)
+    print(solution)
-    print(f"Total distance: {fitness}")
+    print(f"Total distance: {solution.fitness.values[0]}")
    if not duplicates:
        print("No duplicates found")
    print(f"Execution time: {time_delta}")
-def usage(argv):
+def parse_arguments():
-    print(f"Usage: python {argv[0]} <file> <algorithm choice>")
+    parser = ArgumentParser()
-    print("algorithm choices:")
+    parser.add_argument("file", help="dataset of choice")
-    print("genetic: genetic algorithm")
+    subparsers = parser.add_subparsers(dest="algorithm")
-    print("memetic: memetic algorithm")
+    parser_genetic = subparsers.add_parser("genetic")
-    exit(1)
+    parser_memetic = subparsers.add_parser("memetic")
    parser_genetic.add_argument("crossover", choices=["uniform", "position"])
    parser_genetic.add_argument("selection", choices=["generational", "stationary"])
    parser_memetic.add_argument("hybridation", choices=["all", "random", "best"])
    return parser.parse_args()
 def main():
-    if len(argv) != 3:
+    args = parse_arguments()
-        usage(argv)
+    n, m, data = parse_file(args.file)
    n, m, data = parse_file(argv[1])
    start_time = time()
-    solutions = execute_algorithm(choice=argv[2], n=n, m=m, data=data)
+    solutions = execute_algorithm(args, n, m, data)
    end_time = time()
-    fitness = get_fitness(solutions, data)
+    show_results(solutions, time_delta=end_time - start_time)
    show_results(solutions, fitness, time_delta=end_time - start_time)
 if __name__ == "__main__":
--- a/src/memetic_algorithm.py
+++ b/src/memetic_algorithm.py
@@ -1,50 +1,59 @@
-from numpy.random import choice, seed
+from genetic_algorithm import *
 from local_search import local_search
 from copy import deepcopy
-def get_first_random_solution(m, data):
+def get_best_indices(n, population):
-    seed(42)
+    select_population = deepcopy(population)
-    random_indexes = choice(len(data.index), size=m, replace=False)
+    best_elements = []
-    return data.loc[random_indexes]
+    for _ in range(n):
        best_index, _ = get_best_elements(select_population)
        best_elements.append(best_index)
        select_population.pop(best_index)
    return best_elements
-def element_in_dataframe(solution, element):
+def replace_elements(current_population, new_population, indices):
-    duplicates = solution.query(
+    for item in indices:
-        f"(source == {element.source} and destination == {element.destination}) or (source == {element.destination} and destination == {element.source})"
+        current_population[item] = new_population[item]
-    )
+    return current_population
    return not duplicates.empty
-def replace_worst_element(previous, data):
+def run_local_search(n, data, population, mode, probability=0.1):
    solution = previous.copy()
    worst_index = solution["distance"].astype(float).idxmin()
    random_element = data.sample().squeeze()
    while element_in_dataframe(solution=solution, element=random_element):
        random_element = data.sample().squeeze()
    solution.loc[worst_index] = random_element
    return solution, worst_index
 def get_random_solution(previous, data):
    solution, worst_index = replace_worst_element(previous, data)
    previous_worst_distance = previous["distance"].loc[worst_index]
    while solution.distance.loc[worst_index] <= previous_worst_distance:
        solution, _ = replace_worst_element(previous=solution, data=data)
    return solution
 def explore_neighbourhood(element, data, max_iterations=100000):
    neighbourhood = []
-    neighbourhood.append(element)
+    if mode == "all":
-    for _ in range(max_iterations):
+        for individual in population:
-        previous_solution = neighbourhood[-1]
+            neighbourhood.append(local_search(individual, n, data))
-        neighbour = get_random_solution(previous=previous_solution, data=data)
+        new_population = neighbourhood
-        neighbourhood.append(neighbour)
+    elif mode == "random":
-    return neighbour
+        expected_individuals = len(population) * probability
        indices = []
        for _ in range(expected_individuals):
            random_index = randint(len(population))
            random_individual = population[random_index]
            neighbourhood.append(local_search(random_individual, n, data))
            indices.append(random_index)
        new_population = replace_elements(population, neighbourhood, indices)
    else:
        expected_individuals = len(population) * probability
        best_indices = get_best_indices(n=expected_individuals, population=population)
        for element in best_indices:
            neighbourhood.append(local_search(population[element], n, data))
        new_population = replace_elements(population, neighbourhood, best_indices)
    return new_population
-def memetic_algorithm(m, data):
+def memetic_algorithm(n, m, data, hybridation, max_iterations=100000):
-    first_solution = get_first_random_solution(m=m, data=data)
+    population = [generate_individual(n, m, data) for _ in range(n)]
-    best_solution = explore_neighbourhood(
+    population = evaluate_population(population, data)
-        element=first_solution, data=data, max_iterations=100
+    for i in range(max_iterations):
-    )
+        if i % 10 == 0:
-    return best_solution
+            population = run_local_search(n, data, population, mode=hybridation)
            i += 5
        parents = select_parents(population, n, mode="stationary")
        offspring = crossover(mode="position", parents=parents, m=m)
        offspring = mutate(offspring, n, data)
        population = replace_population(population, offspring, mode="stationary")
        population = evaluate_population(population, data)
    best_index, _ = get_best_elements(population)
    return population[best_index]
Author	SHA1	Message	Date
coolneng	c6118e2d86	Add summary	2021-06-22 15:16:46 +02:00
coolneng	03afe1a00f	Execute each algorithm once	2021-06-22 01:47:11 +02:00
coolneng	1aafc9bdda	Fix element modification in dataframe list	2021-06-22 01:10:05 +02:00
coolneng	fb5e9dc703	Remove redundant code	2021-06-22 00:21:14 +02:00
coolneng	112f40d00f	Replace the population after the local search	2021-06-21 21:41:00 +02:00
coolneng	32eac42e7b	Take into account local search iterations	2021-06-21 19:51:42 +02:00
coolneng	f61cb7002e	Implement memetic algorithm	2021-06-21 18:22:38 +02:00
coolneng	9aeff47bb1	Fix parameters typos	2021-06-21 17:57:48 +02:00
coolneng	764d235b4d	Rename populate dataframes to populate dataframe	2021-06-21 17:56:52 +02:00
coolneng	7cbb25c546	Add local search module	2021-06-21 17:54:37 +02:00
coolneng	20aa6b2d1e	Refactor execution script	2021-06-21 17:54:26 +02:00
coolneng	4e640ffc2d	Add memetic algorithm prototype	2021-06-21 07:39:51 +02:00
coolneng	ab4748d28e	Change fitness evaluation	2021-06-21 07:39:39 +02:00
coolneng	c2cc3c716d	Limit number of iterations to 100	2021-06-21 03:50:14 +02:00
coolneng	e20e16d476	Clean up genetic algorithm	2021-06-21 03:48:50 +02:00
coolneng	924e4c9638	Change CLI using argparse	2021-06-21 03:46:35 +02:00