Key components in a Gitlab CICD pipeline

admin
Devsecops
November 11, 2023
7 minutes

As we have learned in the first part of this series, a CI/CD pipeline is a series of steps that you can use to automate the software development and delivery process. In this article, we will thoroughly explore the key components required for designing and establishing a CICD pipeline within the GitLab platform.

A variety of pipelines, such as those in production or supply chains, are composed of distinct stages, each of which encompasses one or more tasks to be executed by its own environment. Let’s revisit the pizza making and delivery process as an example.

In the pizza-making process, we begin by preparing the dough and keep it ready for the later use. Upon receiving an order, the subsequent phase is initiated, during which the chef crafts the pizza base and adds the desired toppings. Moving on to the cooking stage, the chef preheats the oven and subsequently places the raw pizza into it for baking.

Stages in a CICD Pipeline

A stage in a pipeline can be viewed as a phase in a production pipeline that consists of one or more tasks executed either sequentially or in parallel, all working together to achieve the specific target goal of that phase.

A typical software CICD pipeline might consists of the following stages:

build : - During this stage, the software is compiled, and an artifact is created through the build process.
test : - The compiled software artifact is gone through various testing and qualitiy analsis process.
publish : - A successfully tested artifact is stored in a safe place for the future use.
staging_deployment : - Deploy the artifact to a pre-production environment and validate that the software artifact is reliable and ready for deployment to the production environment.
production_deployment : - Deploy the approved and validated artifact to the production environment.

Job and Artifacts in a pipeline

The fundamental building blocks within a .gitlab-ci.yml file are jobs. These job definitions in the YAML file provide GitLab with the necessary information to determine when they should run. Within the job definition, the script section comprises a sequence of commands that are executed to fulfill the job’s requirements and complete its execution.

build:
  script:
    - echo "build software artifact"

test:
  script:
    - echo "build software artifact"

The artifact block within the job definition guides GitLab on where to store the outcomes generated during the job’s execution. These artifacts might encompass compiled software or various types of reports.

get-app-version:
  script:
    - echo "get the application version"
    - "APP_VERSION=${DEPLOY_VERSION}" > variable.env
  artifact:
    reports:
      dotenv: variable.env

build:
  script:
    - echo "build software artifact"
  artifact:
    paths:
      - target/**/*

test:
  script:
    - echo "test software artifact"
    - rspec --format RspecJunitFormatter --out rspec.xml
  artifact:
    reports:
      junit: rspec.xml

publish:
  script:
    - echo "publish the software artifact with application version $APP_VERSION"

Typically, in pipelines, the outputs generated by a job are commonly utilized by the following job and may not be necessary once the pipeline is finished. In such scenarios, it is advisable to delete these artifacts after a specific period of time. The job artifact section includes an attribute called expire_in to define the duration after which the artifact will be automatically removed.

build:
  script:
    - echo "build software artifact"
  artifact:
    paths:
      - target/**/*
    expire_in: 30 minutes

publish:
  script:
    - echo "store the artifacts which is generated in the previous job"

While the artifacts produced are normally accessible in the subsequent jobs within the pipeline, it is possible to direct a GitLab job to refrain from fetching any previously generated artifacts into its execution environment.

job-a:
  stage: test
  script: echo "execute job a"
  dependencies: []

Dependency among jobs in a pipeline

In a pipeline, job execution can vary between sequential and parallel modes based on the availability of required resources and the predetermined order of job execution.

In the pizza-making process, it’s entirely feasible to simultaneously cut the vegetables and prepare the dough because these are independent tasks that only require the necessary resources and an executor. Similarly, in a GitLab CI/CD pipeline, you can design parallel jobs to achieve this. The keyowrds needs and dependencies are used to show the dependencies between jobs.

needs: Execution of jobs with dependencies defined using “needs” can occur out-of-order. The relationships between these jobs, expressed through “needs,” can be represented as a directed acyclic graph. The order of stages can be disregarded, allowing certain jobs to commence independently of others. Concurrent execution is possible for jobs spanning multiple stages.
dependencies: Use the “dependencies” keyword to specify a list of particular jobs from which to retrieve artifacts. In cases where the “dependencies” keyword is absent in a job, it implies a dependency on all jobs in preceding stages, and the job will fetch artifacts from all of those jobs. It’s important to note that configuring a job with an empty array ([]) signifies that the job should not download any artifacts.

The following snippet illustrates how to set up parallel jobs in a GitLab CI/CD pipeline.

stages
  - lint
  - build
  - test
  - scan

job-0:
  script: echo "execute job 0"

job-a:
  stage: lint
  script: echo "execute job a"

job-b:
  stage: build
  script: echo "execute job b"
  dependencies: ["job-0"]  

job-c:
  stage: test
  script: echo "execute job c"
  needs: ["job-b"]
  artifacts:
    path:
      - targets/generated-artifact-*.zip

job-d:
  stage: test
  script: echo "execute job d"
  needs: ["job-b"]

job-e:
  stage: scan
  script: echo "execute job e"
  dependencies: []

What would be the result of the above pipeline ? It would be good if you test it by yourself and share the result in the comment box.

Stageless pipeline

In GitLab CI/CD, it is possible to create a “stageless” pipeline by defining jobs without specifying stages. This means that the jobs are not organized into distinct stages, and they will run in the order they appear in the .gitlab-ci.yml file without any stage-related dependencies. There are advantages and disadvantages of using stageless pipeline and it is depends on the specific requirements of your project. Some of the pros and cons are:

Pros

Simplicity: Stageless pipelines are often simpler to set up and understand. There is no need to define and manage multiple stages, making the configuration more straightforward.
Easier Troubleshooting: Debugging and troubleshooting can be more straightforward because all jobs are executed sequentially within a single stage, making it easier to identify and isolate issues.
Linear Flow: A stageless pipeline typically follows a linear execution flow, which can be advantageous for simple projects with a straightforward CI/CD process.

Cons

Limited Parallelism: Stageless pipelines might limit parallelism since all jobs are defined in the same stage. In traditional pipelines with multiple stages, jobs within a stage can run concurrently, allowing for better resource utilization.
Scalability Challenges: As your project grows in complexity, a stageless pipeline might become harder to manage. It may not scale well for large projects with diverse build, test, and deployment requirements.
Harder to Represent Stages: If a CI/CD process naturally involves distinct stages (e.g., build, test, deploy), a stageless pipeline might not represent the development lifecycle as clearly, potentially making it harder to understand the flow of the CI/CD process.

The decision to use a stageless pipeline in GitLab depends on the specific needs of your project. For simpler projects with a linear CI/CD process, a stageless pipeline may offer simplicity and ease of use. However, for larger and more complex projects, or those with specific requirements for parallelism and control over execution flow, a multi-stage pipeline might be a more suitable choice.

Variables

In GitLab CI/CD, a variable is a data element utilized to set up configurations for jobs, stages, and triggers within a pipeline. These variables serve as containers for values shared across multiple jobs, like dependency versions or production server URLs. Additionally, they can store job-specific values to tailor configurations for individual tasks.There are three ways to define variables in a GitLab CI/CD pipeline:

Project/group variables: These variables are defined in the GitLab UI or in a .gitlab-ci.yml file and are available to all pipelines in the project or group.
Pipeline global variables: These variables are defined in the .gitlab-ci.yml file and are only available to the pipeline in which they are defined.
Job specific variables: These variables are defined in a job of the .gitlab-ci.yml file and are only available to the job in which they are defined.

variables:
  APP_VERSION: 1.0.0
  DATABASE_URL: postgresql://localhost:5432/my_database

build:
  variables:
    BUILD_ARGS: "GO_VERSION=1.19"
  script:
    - docker build --build-arg=$BUILD_ARGS -t my-app:$APP_VERSION .

deploy:
  script:
    - docker run -p 8080:8080 my-app:$APP_VERSION

As the topic of variables in Gitlab CICD needs a detailed explanation, we would like to cover it in a separate article.

Triggers and Downstream pipeline

Let’s examine the pizza-making and delivery pipeline as an example to delve into triggers and downstream pipelines. In this scenario, dough preparation is a segment of the pizza-making pipeline. After the dough is prepared, the pipeline is currently in a waiting state until a customer order is received, which is a trigger to proceeds with the tasks in the making stage.

In the context of GitLab’s CI/CD pipeline, there are tasks referred to as trigger tasks. These tasks initiate a downstream pipeline, which can either be a child pipeline or a multi-project pipeline.

If a downstream pipeline is initiated within the same project by a parent pipeline, it is referred to as a child pipeline. Use the trigger keyword within a job in the .gitlab-ci.yml file to create a trigger job. Specify the path to a downstream pipeline file within the same project.

trigger_job:
  trigger:
    include:
      - local: path/to/downstream-child-pipeline.yml

When a parent pipeline initiates a downstream pipeline in another project, it is termed a multi-project downstream pipeline. The distinction in the trigger job setup lies in the inclusion of the project path instead of the file path.

trigger_job:
  trigger:
    project: group/downstream-project
    # branch: pipeline-branch # optional to specify the branch

Could we see the status of the child pipeline in the parent pipeline? Yes, it is possible to mirror the status the status of the downstream pipeline in the trigger job by using strategy: depend.

trigger_job:
  trigger:
    project: group/downstream-project
    strategy: depend

Conclusion

In the second part of Gitlab CICD pipeline series, we delved into the intricacies of the key components within a .gitlab-ci.yaml file. While a basic pipeline can be crafted as a stageless configuration, opting for a multi-stage pipeline becomes advantageous when dealing with more intricate CI/CD processes.

#Gitlab
#Cicd