========================================= Configure and Run Workflow for AWS Batch ========================================= You will need three values from the :doc:`set_up_aws` guide to configure the workflow: 1. The path to the S3 bucket that was set up. This will be referred to as ``BUCKET_PATH`` below. 2. The logs group that was set up. This will be referred to as ``LOGS_GROUP`` below. 3. The job queue name. This will be referred to as ``JOB_QUEUE`` below. Follow the directions below to configure and run your workflow using AWS Batch. Configure Workflow =================== Each of the sets of documentation for the different TEI-REX workflows (see: https://mriffle.github.io/teirex-workflows/) includes a description of the ``pipeline.config`` file for that workflow. Regardless of which workflow you are running, the ``pipeline.config`` file will need the same modifications in order to run the workflow on AWS. We recommend making these modifications once and re-using the same ``pipeline.config`` for the respective workflow, making modifications as necessary for the specific data being searched. First, add the following block to the end of your ``pipeline.config`` file. Note, you will need to substitute in the value for ``LOGS_GROUP`` (described above). If you configured your AWS Batch in a region other than *us-west-2*, you should update the value for ``region`` below to match your region. .. code-block:: groovy aws { batch { // NOTE: this setting is only required if the AWS CLI tool is installed in a custom AMI cliPath = '/usr/local/aws-cli/v2/current/bin/aws' logsGroup = 'LOGS_GROUP' maxConnections = 20 connectionTimeout = 10000 uploadStorageClass = 'INTELLIGENT_TIERING' storageEncryption = 'AES256' retryMode = 'standard' } region = 'us-west-2' } Next, update the ``profiles`` section of ``pipeline.config`` to include an ``aws`` section. Before adding the ``aws`` section, this will appearly similarly to: .. code-block:: groovy profiles { // "standard" is the profile used when the steps of the workflow are run // locally on your computer. These parameters should be changed to match // your system resources (that you are willing to devote to running // workflow jobs). standard { params.max_memory = '8.GB' params.max_cpus = 4 params.max_time = '240.h' params.mzml_cache_directory = '/data/mass_spec/nextflow/nf-teirex-dda/mzml_cache' params.panorama_cache_directory = '/data/mass_spec/nextflow/panorama/raw_cache' } } After adding the ``aws`` section, this will appear similarly to: .. code-block:: groovy profiles { // "standard" is the profile used when the steps of the workflow are run // locally on your computer. These parameters should be changed to match // your system resources (that you are willing to devote to running // workflow jobs). standard { params.max_memory = '8.GB' params.max_cpus = 4 params.max_time = '240.h' params.mzml_cache_directory = '/data/mass_spec/nextflow/nf-teirex-dda/mzml_cache' params.panorama_cache_directory = '/data/mass_spec/nextflow/panorama/raw_cache' } // "aws" profile -- parameters used for running jobs on AWS Batch aws { process.executor = 'awsbatch' process.queue = 'JOB_QUEUE' // params for running pipeline on aws batch // These can be overridden in local config file // max params allowed for your AWS Batch compute environment params.max_memory = '124.GB' params.max_cpus = 32 params.max_time = '240.h' // where to cache mzml files after running msconvert params.mzml_cache_directory = 's3://BUCKET_PATH/mzml_cache' params.panorama_cache_directory = 's3://BUCKET_PATH/panorama_cache' } } Replace ``JOB_QUEUE`` and ``BUCKET_PATH`` with the values described above. A description of each parameter is below: - ``process.executor`` - This instructs Nextflow which executor to use when the *aws* profile is used. Do not change. - ``process.queue`` - This is the job queue set up for AWS Batch (see: :doc:`set_up_aws` guide). - ``params.max_memory`` - This is the maximum amount of memory a task may use in the workflow. This should not exceed the maximum memory available in the hardware profiles for the AWS Batch compute environment. - ``params.max_cpus`` - This is the maximum number of cores a task may use in the workflow. This should not exceed the maximum cores available in the hardware profiles for the AWS Batch compute environment. - ``params.max_time`` - This is the maximum amount of time (in hours) a task may be run before being automatically killed and retried. - ``params.mzml_cache_directory`` - This is the path to the cache of mzML files created by processing raw files. - ``params.panorama_cache_directory`` - This is the path to the cache of raw files created by downloading raw files from PanoramaWeb. If not using PanoramaWeb a value is still required, but will never be used. Run Workflow ============ Once all the AWS setup and configuration is complete, running the workflow on AWS is very simple: merely specify ``aws`` as the profile and specify the S3 work directory on the command line. For example: - For the DDA workflow: .. code-block:: bash nextflow run -resume -r main -profile aws mriffle/nf-teirex-dda -bucket-dir s3://BUCKET_PATH/work -c pipeline.config - For the DIA workflow: .. code-block:: bash nextflow run -resume -r main -profile aws mriffle/nf-teirex-dia -bucket-dir s3://BUCKET_PATH/work -c pipeline.config .. note:: Replace ``BUCKET_PATH`` with the S3 bucket you set up in the :doc:`set_up_aws` guide.