⚠️ This project is still in an early phase of development.

The python API is not yet stable, and some aspects of the schema for the blueprint will likely evolve. Therefore whilst you are welcome to try out using the package, we cannot yet guarantee backwards compatibility. We expect to reach a more stable version in 2026.

To see which systems C-Star has been tested on so far, see Supported Systems.

Tracking runs executed as jobs on HPC systems#

Contents#

  1. Introduction

  2. Importing an example Simulation and running it on HPC with a job scheduler

  3. Tracking the submitted job

  4. Cancelling a job

  5. Summary

1. Introduction#

(return to top)

On this page, we will look at how to use C-Star on supported HPC systems with job schedulers, including:

  • Submitting a job to a scheduler queue

  • Checking the id of a job submitted to the queue

  • Checking the status of a job submitted to the queue

  • Receiving live updates from a job submitted to the queue

  • Cancelling a job submitted to the queue

2. Importing an example Simulation and running it on HPC with a job scheduler#

We will import and set up the same simulation as our tutorial on importing and running Simulations.

[2]:
from cstar.roms import ROMSSimulation

example_simulation_1 = ROMSSimulation.from_blueprint(blueprint  = "https://raw.githubusercontent.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/main/cstar_blueprint_example_with_netcdf_inputs.yaml",
                                                     directory  = "../../examples/example_case/",
                                                     start_date = "2012-01-03 12:00:00",
                                                     end_date   = "2012-01-06 12:00:00")

2i. A quick look at the system’s scheduler#

Before running the case, let’s take a look at this system’s (i.e. NERSC Perlmutter’s) scheduler. We can do this via the global variable cstar_sysmgr, using its scheduler property:

[3]:
from cstar.system.manager import cstar_sysmgr
print(cstar_sysmgr.scheduler)
SlurmScheduler
--------------
primary_queue: regular
queues:
- regular
- shared
- debug
other_scheduler_directives: {'-C': 'cpu'}
global max cpus per node: 256
global max mem per node: 503.02734375GB
documentation: https://docs.nersc.gov/systems/perlmutter/architecture/

From here we can see some global properties of the current system’s scheduler, including its queues and a link to its official documentation.

We can query a queue to see its time limit before submitting a job to it:

[4]:
print(cstar_sysmgr.scheduler.get_queue("shared"))
SlurmQOS:
--------
name: shared
max_walltime: 48:00:00

2ii. Submitting a job to the scheduler queue#

We can now set up and run the job as in the corresponding tutorial, assigning the SlurmJob instance returned by ROMSSimulation.run() to a variable we can keep track of.

[6]:
example_simulation_1.setup()
example_simulation_1.build()
example_simulation_1.pre_run()

hpc_job = example_simulation_1.run(account_key="m4746", walltime="00:10:00", queue_name="shared")
[INFO] 🛠️  Configuring ROMSSimulation
[INFO] 🔧 Setting up ROMSExternalCodeBase...
[INFO] ✅ ROMSExternalCodeBase correctly configured. Nothing to be done
[INFO] 🔧 Setting up MARBLExternalCodeBase...
[INFO] ✅ MARBLExternalCodeBase correctly configured. Nothing to be done
[INFO] 📦 Fetching compile-time code...
[INFO] • Copying bgc.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying bulk_frc.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying cppdefs.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying diagnostics.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying ocean_vars.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying param.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying tracers.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying Makefile to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] • Copying Make.depend to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
[INFO] ✅ All files copied successfully
[INFO] 📦 Fetching runtime code...
[INFO] • Copying roms.in_TEMPLATE to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code
[INFO] Copying template file /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code/roms.in_TEMPLATE to editable version /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code/roms.in
[INFO] • Copying marbl_in to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code
[INFO] • Copying marbl_tracer_output_list to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code
[INFO] • Copying marbl_diagnostic_output_list to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code
[INFO] ✅ All files copied successfully
[INFO] 📦 Fetching input datasets...
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_grd.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_ini.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_tides.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry_bgc.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_frc.nc already exists, skipping.
[INFO] ⏭️ /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_frc_bgc.nc already exists, skipping.
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_grd.nc into (3,3)
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_ini.nc into (3,3)
/global/homes/d/dafydd/.conda/envs/cstar_env/lib/python3.13/site-packages/roms_tools/tiling/partition.py:322: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_dataset(filepath.with_suffix(".nc"))
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_tides.nc into (3,3)
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry.nc into (3,3)
/global/homes/d/dafydd/.conda/envs/cstar_env/lib/python3.13/site-packages/roms_tools/tiling/partition.py:322: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_dataset(filepath.with_suffix(".nc"))
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry_bgc.nc into (3,3)
/global/homes/d/dafydd/.conda/envs/cstar_env/lib/python3.13/site-packages/roms_tools/tiling/partition.py:322: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_dataset(filepath.with_suffix(".nc"))
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_frc.nc into (3,3)
/global/homes/d/dafydd/.conda/envs/cstar_env/lib/python3.13/site-packages/roms_tools/tiling/partition.py:322: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_dataset(filepath.with_suffix(".nc"))
[INFO] Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_frc_bgc.nc into (3,3)
/global/homes/d/dafydd/.conda/envs/cstar_env/lib/python3.13/site-packages/roms_tools/tiling/partition.py:322: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_dataset(filepath.with_suffix(".nc"))

3. Tracking the submitted job#

3i. Viewing the submitted script#

We can see the script that was submitted to the scheduler using the script property:

[7]:
print(hpc_job.script)
#!/bin/bash
#SBATCH --job-name=cstar_job_20250428_151855
#SBATCH --output=/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/output/cstar_job_20250428_151855.out
#SBATCH --qos=shared
#SBATCH --ntasks=9
#SBATCH --account=m4746
#SBATCH --export=ALL
#SBATCH --mail-type=ALL
#SBATCH --time=00:10:00
#SBATCH -C cpu

srun -n 9 /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code/roms /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code/roms.in

We can see where the script is saved using the script_path property:

[8]:
hpc_job.script_path
[8]:
PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/docs/howto_guides/cstar_job_20250428_151855.sh')

We can see the output file where the job’s output will be written using the output_file property:

3ii. Checking the job ID#

We can check the scheduler-assigned job ID using the id property:

[9]:
hpc_job.id
[9]:
38171360

3iii. Checking the status#

We can check the job status using the status property. Possible values are:

  • UNSUBMITTED: the job is not yet submitted to the scheduler

  • PENDING: the job is in the queue

  • RUNNING: the job is underway

  • COMPLETED: the job is finished

  • CANCELLED: the job was cancelled by the user

  • FAILED: the job finished unsuccessfully

  • HELD: the job is being held in the queue

  • ENDING: the job is in the process of finishing

  • UNKNOWN: the job status cannot be determined

[10]:
hpc_job.status
[10]:
<ExecutionStatus.RUNNING: 3>

3iv. Viewing the output file path#

The output file contains the standard output and error streams returned by the job

[11]:
hpc_job.output_file
[11]:
PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/output/cstar_job_20250428_151855.out')

3v. Receiving live updates from the output file#

While the job is running, we can stream any new lines written to the output file using the updates() method. This method receives a seconds parameter, and will provide live updates for the number of seconds provided by the user (default 10). If the user specifies seconds=0, updates will be provided indefinitely until stopped with a keyboard interruption (typically via Ctrl-c)

[12]:
hpc_job.updates(seconds=0.5)
[INFO]  doing BGC with MARBL

[INFO]      14 4383.5097 5.17864838377-03 4.7561736558-03  0.006002978339  0.004696106616     19     28   12

[INFO]  doing BGC with MARBL

[INFO]      15 4383.5104 5.18640168207-03 4.7570765305-03  0.005809619472  0.004707297773     19     28   11

[INFO]  doing BGC with MARBL

[INFO]      16 4383.5111 5.19427266742-03 4.7583422469-03  0.005605918205  0.004625356155     19     28   11

[INFO]  doing BGC with MARBL

[INFO]      17 4383.5118 5.20171947152-03 4.7593890773-03  0.005394811880  0.004480351366     19     28   12

[INFO]  doing BGC with MARBL

4. Cancelling a job#

We can cancel the job using the cancel method:

[13]:
hpc_job.cancel()
[14]:
hpc_job.status
[14]:
<ExecutionStatus.CANCELLED: 5>

5. Summary#

(return to top)

In this guide, we set up and ran the example Simulation that we built in another tutorial, with a particular focus on the SchedulerJob instance associated with the run. We looked at tracking the run’s status and output files, and cancelling the run.