Desarrollo y Mantenimiento de Sistemas Informáticos

4º. 1er cuatrimestre. Itinerario de Sistemas de la Información. Grado en Ingeniería Informática. ULL

GH Org - GH Template Org - GitHub Classroom - Discussions - Teams - Campus ULL - Students Activity - Chat - Google Meet

Práctica Jekyll Search (jekyll-search)

Práctica Jekyll Search (jekyll-search)

Entrega

La entrega de esta práctica se realizará en el mismo repo asociado a la práctica Introduction to Systems Development. Cree una rama intro2sd para señalar el punto de entrega de la anterior y haga la entrega de esta tarea en la rama main.

Adding a Simple Search to our Jekyll Site

El ejercicio consiste en que añada la capacidad de búsqueda al sitio web contruido en la práctica Introduction to Systems Development

Estos son algunos requisitos:

Queremos que busque en todos los ficheros, no solo los de los posts sino también los de las páginas
Que admita expresiones regulares
Queremos que los resultados vayan apareciendo conforme tecleamos
Se mostrará una lista de enlaces a los ficheros que contienen la expresión buscada y un resumen de las primeros caracteres del fichero
Lea el capítulo 2 del libro Developing Information Systems, editado by James Cadle y haga un resumen en un post del web site
- Capítulo 2: Lifecycle types and their rationales por Lynda Girvan

¿Como hacerlo?

Since Jekyll has no server side execution, we have to rely on storing all the required content in a single file and search our keyword from that file.

We will be creating a JSON file in which we will store title, url, content, excerpt, etc., at building time

 $ bundle exec jekyll build
 $ head -n 30 _site/assets/src/search.json 

 [
     {
         "title": "Clase del Lunes 30/09/2019",
         "excerpt": "Clase del Lunes 30/09/2019\n\n",         "⇐": " Resumen",
         "content": "Clase del Lunes 30/09/2019\n\n\n  ...",  "⇐ ": "Contenido del fichero"
         "url": "/clases/2019/09/30/leccion.html"
     },
     "...": "..."
 ]

Véase search.json (protected)

Liquid template to generate at build time the _site/assets/src/search.json

---
layout: null
sitemap: false
---

{% capture json %}
[
  {% assign collections = site.collections | where_exp:'collection','collection.output != false' %}
  {% for collection in collections %}
    {% assign docs = collection.docs | where_exp:'doc','doc.sitemap != false' %}
    {% for doc in docs %}
      {
        "title": {{ doc.title | jsonify }},
        "excerpt": {{ doc.excerpt | markdownify | strip_html | jsonify }},
        "content": {{ doc.content | markdownify | strip_html | jsonify }},
        "url": {{ site.baseurl | append: doc.url | jsonify }}
      },
    {% endfor %}
  {% endfor %}
  
  {% assign pages = site.html_pages | where_exp:'doc','doc.sitemap != false' | where_exp:'doc','doc.title != null' %}

  {% for page in pages %}
  {
    "title": {{ page.title | jsonify }},
    "excerpt": {{ page.excerpt | markdownify | strip_html | jsonify }},
    "content": {{ page.content | markdownify | strip_html | jsonify }},
    "url": {{ site.baseurl | append: page.url | jsonify }}
  }{% unless forloop.last %},{% endunless %}
  {% endfor %}
]
{% endcapture %}

{{ json | lstrip }}

You can find the source code at /ULL-MFP-AET/ull-mfp-aet.github.io/main/assets/src/search.json

layout: null: To disable layout in Jekyll.
sitemap: false:
- A Sitemap is an XML file that lists the URLs for a site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from rest of the site’s content. The sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol. We can use the front-matter to set the sitemap property to false
- jekyll-sitemap is a Jekyll plugin to silently generate a sitemaps.org compliant sitemap for your Jekyll site
Liquid: {% capture json %} ... {% endcapture %} Captures the string inside of the opening and closing tags and assigns it to a variable. Variables that you create using capture are stored as strings.
{{ json | lstrip }}:
- Filters are simple methods that modify the output of numbers, strings, variables and objects. They are placed within an output tag {{ }} and are denoted by a pipe character |.
- lstrip: Removes all whitespace (tabs, spaces, and newlines) from the left side of a string. It does not affect spaces between words.
{% assign collections = site.collections ...
- site.collections: Collections are also available under site.collections. Posts are considered a collections by Jekyll.
- … where_exp:'collection','collection.output != false'
  - site.collections is an array. With where_exp we select all the objects in the array with the elements for which the attribute collection has its output attribute to true.
  - The output attribute of a collection controls whether the collection’s documents will be output as individual files.
iteration in Liquid
site.html_pages: A subset of site.pages listing those which end in .html.

Use the Liquid Playground to test the Liquid expressions above. The lower left panel is to enter a JSON holding variables that can be accesed in the upper left panel by its name.

Entendiendo la línea `"content": {{ page.content | markdownify | strip_html | jsonify }},`

page.content el contenido de la página todavia sin renderizar (se supone que es fundamentalmente markdown, pero puede contener yml en el front-matter, html, scripts, liquid, etc.)
markdownify: Convert a Markdown-formatted string into HTML.
strip_html: Removes any HTML tags from a string.
jsonify: If the data is an array or hash you can use the jsonify filter to convert it to JSON.

Por ejemplo, supongamos que tenemos estas definiciones en el front-matter de nuestra página:

chuchu: "Cadena **negritas** e *italicas*"
html: "<h1>hello</h1> <b>world</b>"
colors:
  - red
  - blue
  - green
---

y que en el contenido de nuestra página tenemos algo así:

Compara < script>{{ page.chuchu }} </script> con su markdownify: < script>{{ page.chuchu | markdownify }}</script>

Compara < script> {{ page.colors}} </script> con su jsonify: < script>{{ page.colors | jsonify }} </script>

Compara < script>{{page.html}}</script> con su `strip_html` < script> {{ page.html | strip_html }} </script>

Esta es la salida que produce jekyll 4.0.0:

<p>Compara < script>Cadena **negritas** e *italicas* </script> con su markdownify: < script>&lt;p&gt;Cadena <strong>negritas</strong> e <em>italicas</em>&lt;/p&gt;
</script></p>

<p>Compara < script> redbluegreen </script> con su jsonify: < script>["red","blue","green"] </script></p>

<p>Compara < script>&lt;h1&gt;hello&lt;/h1&gt; <b>world</b></script> con su <code class="highlighter-rouge">strip_html</code> < script> hello world </script></p>

La idea general es que necesitamos suprimir los tags, tanto yml, markdown, HTML, etc. para que no confundan al método de busca. Por eso convertimos el markdown a HTML y después suprimimos los tags HTML. También convertimos el yml a JSON.

La página de Búsqueda: search.md

Fuente: search.md

La idea es que vamos a escribir una clase JekyllSearch que implementa la búsqueda. Debe disponer de un constructor al que se le pasan cuatro argumentos:

La ruta donde esta disponible el fichero .json generado durante la construcción (jekyll build)
El id del objeto del DOM en la página en la que está el tag input de la búsqueda
El iddel objeto del DOM en el que se deben volcar los resultados
La url del lugar en el que está el deployment (pudiera ser que el site en el que queremos buscar fuera una subcarpeta de todo el site)

const search = new JekyllSearch(
    '/assets/src/search.json',
    '#search',
    '#list',
    ''
  );
  search.init(); 

Los objetos JekyllSearch deben disponer de un método init que realiza la búsqueda especificada en el elemento del DOM #search y añade los resultados en en el elemento del DOM #list

---
layout: error 
permalink: /search/
title: Search
---

{% capture initSearch %}

<h1>Search</h1>

<form id="search-form" action="">
  <label class="label" for="search">Search term (accepts a regex):</label>
  <br/>
  <input class="input" id="search" type="text" name="search" 
        autofocus 
        placeholder="e.g. Promise" 
        autocomplete="off">
  
  <ul class="list  list--results" id="list">
  </ul>
</form>

< script type="text/javascript" src="/assets/src/fetch.js"></script>
< script type="text/javascript" src="/assets/src/search.js"></script>


< script type="text/javascript">

  const search = new JekyllSearch(
    '{{site.url}}/assets/src/search.json',
    '#search',
    '#list',
    '{{site.baseurl}}'
  );
  search.init(); 
  
</script>

<noscript>Please enable JavaScript to use the search form.</noscript>

{% endcapture %}

{{ initSearch | lstrip }}

autocomplete="off"
- En algunos casos, el navegador continuará sugiriendo valores de autocompletado incluso si el atributo autocompletar está desactivado. Este comportamiento inesperado puede resultar bastante confuso para los desarrolladores. El truco para realmente no aplicar el autocompletado es asignar un valor no válido al atributo, por ejemplo:
```
autocomplete="nope"
```
Dado que este valor no es válido para el atributo autocompletar, el navegador no tiene forma de reconocerlo y deja de intentar autocompletarlo.
Filters are simple methods that modify the output of numbers, strings, variables and objects. They are placed within an output tag {{ }} and are denoted by a pipe character |.
- lstrip Removes all whitespace (tabs, spaces, and newlines) from the left side of a string. It does not affect spaces between words.
Clearing Up Confusion Around baseurl. About site.url vs site.baseurl

La clase JekyllSearch: Fichero search.js

You can find the source at ULL-MFP-AET/ull-mfp-aet.github.io/assets/src/search.js

Here are the contents:

class JekyllSearch {
  constructor(dataSource, searchField, resultsList, siteURL) {
    this.dataSource = dataSource
    this.searchField = document.querySelector(searchField)
    this.resultsList = document.querySelector(resultsList)
    this.siteURL = siteURL

    this.data = [];
  }

  fetchedData() {
    return fetch(this.dataSource, {mode: 'no-cors'})
      .then(blob => blob.json())
  }

  async findResults() {
    this.data = await this.fetchedData()
    const regex = new RegExp(this.searchField.value, 'i')
    return this.data.filter(item => {
      return item.title.match(regex) || item.content.match(regex)
    })
  }

  async displayResults() {
    const results = await this.findResults()
    //console.log('this.siteURL = ',this.siteURL)

    const html = results.map(item => {
      //console.log(item)
      return `
        <li class="result">
            <article class="result__article  article">
                <h4>
                  <a href="${item.url}">${item.title}</a>
                </h4>
                <p>${item.excerpt}</p>
            </article>
        </li>`
    }).join('')
    if ((results.length == 0) || (this.searchField.value == '')) {
      this.resultsList.innerHTML = `<p>Sorry, nothing was found</p>`
    } else {
      this.resultsList.innerHTML = html
    }
  }

  // https://stackoverflow.com/questions/43431550/async-await-class-constructor
  init() {
    
    const url = new URL(document.location)
    if (url.searchParams.get("search")) {
      this.searchField.value = url.searchParams.get("search")
      this.displayResults()
    }
    this.searchField.addEventListener('keyup', () => {
      this.displayResults()
      // So that when going back in the browser we keep the search
      url.searchParams.set("search", this.searchField.value)
      window.history.pushState('', '', url.href)
    })
    
    // to not send the form each time <enter> is pressed
    this.searchField.addEventListener('keypress', event => {
      if (event.keyCode == 13) {
        event.preventDefault()
      }
    })
  }
}

constructor

constructor(dataSource, searchField, resultsList, siteURL) {
    this.dataSource = dataSource
    this.searchField = document.querySelector(searchField)
    this.resultsList = document.querySelector(resultsList)
    this.siteURL = siteURL

    this.data = [];
}

The Document method querySelectorAll() returns a static (not live) NodeList representing a list of the document’s elements that match the specified group of selectors.

selectors: In CSS, pattern matching rules determine which style rules apply to elements in the document tree. These patterns, are called selectors, may range from simple element names to rich contextual patterns. If all conditions in the pattern are true for a certain element, the selector matches the element. For instance '#search' and '#list' are selectors.

All methods getElementsBy* return a live collection. Such collections always reflect the current state of the document and auto-update when it changes.

In contrast, querySelectorAll returns a static collection. It’s like a fixed array of elements.

init

 init() {
    const url = new URL(document.location)
    if (url.searchParams.get("search")) {
      this.searchField.value = url.searchParams.get("search")
      this.displayResults()
    }
    this.searchField.addEventListener('keyup', () => {
      this.displayResults()
      // So that when going back in the browser we keep the search
      url.searchParams.set("search", this.searchField.value)
      window.history.pushState('', '', url.href)
    })
    
    // to not send the form each time <enter> is pressed
    this.searchField.addEventListener('keypress', event => {
      if (event.keyCode == 13) {
        event.preventDefault()
      }
    })
  }

URL parameters

(also known as query strings) are a way to structure additional information for a given URL. Parameters are added to the end of a URL after a ? symbol, and multiple parameters can be included when separated by the & symbol.

In our case, we have the search parameter:

url.searchParams

If the URL of your page is https://example.com/?name=Jonathan%20Smith&age=18 you could parse out the name and age parameters using:

let params = (new URL(document.location)).searchParams;
let name = params.get('name'); // is the string "Jonathan Smith".
let age = parseInt(params.get('age')); // is the number 18

The event listeners

    this.searchField.addEventListener('keyup', () => {
      this.displayResults()
      // So that when going back in the browser we keep the search
      url.searchParams.set("search", this.searchField.value)
      window.history.pushState('', '', url.href)
    })

window.history.pushState

The window object provides access to the browser’s session history through the history object. The history.pushState(state, title, url) method adds a state to the browser’s session history stack.

   ... // inside init
   this.searchField.addEventListener('keyup', () => {
      this.displayResults()
      // So that when going back in the browser we keep the search
      url.searchParams.set("search", this.searchField.value)
      window.history.pushState('', '', url.href)
    })

The search.json is not going to change until the next push

findResults

 fetchedData() {
    return fetch(this.dataSource, {mode: 'no-cors'})
      .then(blob => blob.json())
  }

  async findResults() {
    this.data = await this.fetchedData()
    const regex = new RegExp(this.searchField.value, 'i')
    return this.data.filter(item => {
      return item.title.match(regex) || item.content.match(regex)
    })
  }

What is CORS

Read the section Introduction to CORS to know what CORS is.

The mode option of the fetch() method allows you to define the CORS mode of the request:

no-cors prevents the method from being anything other than HEAD, GET or POST, and the headers from being anything other than simple headers.
If any ServiceWorkers intercept these requests, they may not add or override any headers except for those that are simple headers. See section Service Workers for more information.
In addition, no-cors assures that JavaScript may not access any properties of the resulting Response.
- This ensures that ServiceWorkers do not affect the semantics of the Web and prevents security and privacy issues arising from leaking data across domains.

Caching

The resources downloaded through fetch(), similar to other resources that the browser downloads, are subject to the HTTP cache.

fetchedData() {
    return fetch(this.dataSource).then(blob => blob.json())
  }

This is usually fine, since it means that if your browser has a cached copy of the response to the HTTP request, it can use the cached copy instead of wasting time and bandwidth re-downloading from a remote server.

Fetch Polyfill

Polyfill: A polyfill is a piece of code (usually JavaScript on the Web) used to provide modern functionality on older browsers that do not natively support it. The polyfill uses non-standard features in a certain browser to give JavaScript a standards-complaint way to access the feature

El código del polyfill que he usado: assets/src/fetch.js
Para mas información podemos leer este blog: Polyfill para Fetch
- whatwg-fetch: polyfill de Fetch que ha creado el equipo de Github
- Para agregar este polyfill a nuestro proyecto podemos descargarnos su archivo js desde github,
- pero también podríamos instalarlo usando cualquiera de los gestores de dependencias más habituales:
- npm install whatwg-fetch --save
- o bien:
- bower install fetch --save

Estructura del sitio

Esta imagen muestra los ficheros implicados en este ejercicio dentro de la estructura del sitio de estos apuntes:

$ tree -I _site
.
├── 404.md
├── assets
│   ├── css
│   │   └── style.scss
│   ├── images
│   │   ├── event-emitter-methods.png
│   │   └── ,,,
│   └── src
│       ├── fetch.js          ⇐ Polyfill for fetch
│       ├── search.js         ⇐ Librería con la Clase JekyllSearch que implementa el Código de búsqueda
│       └── search.json       ⇐ Plantilla Liquid para generar el fichero JSON 
├── search.md                 ⇐ Página de la búsqueda. Formulario y script de arranque 
├── clases.md
├── _config.yml               ⇐ Fichero de configuración de Jekyll
├── degree.md
├── favicon.ico
├── Gemfile
├── Gemfile.lock
├── _includes                 ⇐ The include tag allows to include the content of files stored here
│   ├── navigation-bar.html
│   └── ...
├── _layouts                  ⇐ templates that wrap around your content
│   ├── default.html
│   ├── error.html
│   └── post.html
├── _posts                    ⇐ File names must follow YEAR-MONTH-DAY-title.MARKUP and  must begin with front matter 
│   ├── ...
│   └── 2019-12-02-leccion.md
├── _practicas                ⇐ Folder for the collection "practicas" (list of published "practicas") 
│   ├── ...
│   └── p9-t3-transfoming-data.md 
├── practicas.md              ⇐ {% for practica in site.practicas %} ... {% endfor %}
├── Rakefile                  ⇐ For tasks
├── README.md
├── references.md
├── resources.md
├── tema0-presentacion        ⇐  Pages folders 
│   ├── README.md
│   └── ...
├── tema ...
├── tfa
│   └── README.md
└── timetables.md

58 directories, 219 files

Videos de clases relacionadas

Clase del Lunes 21/10/2024. Implementing search in Jekyll

Clase del Miércoles 23/10/2023. Implementing search in Jekyll

Rúbrica

Busca en todos los ficheros, no solo los de los posts sino también los de las páginas
Admite expresiones regulares
Los resultados vayan apareciendo conforme tecleamos
Se muestra una lista de enlaces a los ficheros que contienen la expresión buscada y un resumen de las primeros caracteres del fichero
El constructor de JekyllSearch recibe en un objeto los argumentos en vez de posicionalmente
Se ha hecho un resumen del capítulo 2 Lifecycle types and their rationales del libro Developing Information Systems, editado by James Cadle
Se ha creado una rama intro2sd para señalar el punto de entrega de la anterior y se hace la entrega de esta tarea en la rama main.
Código de la práctica correcto y funciona
Informe bien elaborado
Kanban Board project conteniendo las incidencias de la rúbrica
Ha entregado el .zip en el campus con el repo

Referencias

Sección Jekyll en estos apuntes
Liquid Playground
Liquid
Jekyll Liquid Extensions
- Filters
- Tags
Using HTMLProofer From Ruby and Travis. Para testear tus páginas: links, imágenes, etc.