⚠️ This project is still in an early phase of development.

The python API is not yet stable, and some aspects of the schema for the blueprint will likely evolve. Therefore whilst you are welcome to try out using the package, we cannot yet guarantee backwards compatibility. We expect to reach a more stable version in 2026.

To see which systems C-Star has been tested on so far, see Supported Systems.

Working with the InputDataset class#

Contents#

  1. Introduction

  2. InputDataset subclasses and their instantiation

  3. Working with different sources

  4. Partitioning input datasets for use with ROMS

  5. Additional Cases

  6. Summary

1. Introduction#

In C-Star, the InputDataset holds information on, and offers methods relevant to, files containing numerical data required by a simulation (such as initial conditions). This can be compared with the AdditionalCode class, which is related to text-based files needed by a simulation (such as lists of custom settings).

The InputDataset class is an abstract class, and can not be instantiated directly. Instead, the relevant subclass should be used.

2. InputDataset subclasses and their instantiation#

C-Star currently supports the ROMS ocean model, for which there are five InputDataset subclasses:

InputDataset
 └── ROMSInputDataset
     ├── ROMSModelGrid
     ├── ROMSInitialConditions
     ├── ROMSTidalForcing
     ├── ROMSRiverForcing
     ├── ROMSBoundaryForcing
     └── ROMSSurfaceForcing

As mentioned above, the InputDataset and ROMSInputDataset are abstract base classes, so one of these six subclasses must be instantiated.

The parameters required to create an InputDataset instance vary depending on the source. Let’s consider each in turn:

3. Working with different sources#

3i. Working with local, prepared (netCDF) sources#

In the simplest case, the input dataset already exists, in a ROMS-compatible (netCDF) format, on the local filesystem. In this case, we only need to provide the location parameter, with a path to the file:

[1]:
from cstar.roms import ROMSModelGrid
my_grid = ROMSModelGrid(location="~/Code/my_ucla_roms/Examples/input_data/sample_grd_riv.nc")
print(my_grid)
-------------
ROMSModelGrid
-------------
Source location: ~/Code/my_ucla_roms/Examples/input_data/sample_grd_riv.nc
Working path: None ( does not yet exist. Call InputDataset.get() )

Creating a working version with InputDataset.get():#

Note

Most users will not need to use the get() method: if your InputDataset is part of a ROMSSimulation instance, then C-Star will call get() automatically as part of any ROMSSimulation.setup() call.

In the above example, we see that Working path is None and that we should call InputDataset.get() to change this. In the case of a local netCDF file, whose contents cannot be tampered with by C-Star, calling get() creates a symbolic link in the working directory to the source file:

[2]:
my_grid.get(local_dir = "~/Code/my_c_star/examples/input_dataset_example")
[3]:
print(my_grid)
-------------
ROMSModelGrid
-------------
Source location: ~/Code/my_ucla_roms/Examples/input_data/sample_grd_riv.nc
Working path: /Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/sample_grd_riv.nc (exists)
Local hash: {PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/sample_grd_riv.nc'): '8e2f1ca3135ac7f5696d3eaec79b035a1bae15c8a34e751a7f9d925787ab3f6e'}

After calling get() we see that there is now additional information associated with this InputDataset - the source location, as before, but also the Working path (in this case a symbolic link to the source location) and a Local hash: a checksum of the file in question to protect against changes or tampering with the file.

3ii. Working with remote, prepared (netCDF) sources#

In this case, as above, the input dataset already exists, in a ROMS-compatible (netCDF) format, but this time is stored at a remote location. Now, the location parameter will be a URL, and we also need to provide a value for the file_hash parameter.

Note

The file_hash parameter is a unique string summary of the entire file, that is used for security with remote binary files (such as netCDF) to verify that any downloads by C-Star correspond exactly to the expected data. C-Star uses a 256-bit shasum for hashes.

If you do not know the file hash, it is advisable that you ask the creator of the file to check their local copy.

[4]:
from cstar.roms import ROMSModelGrid
my_grid = ROMSModelGrid(location="https://github.com/dafyddstephenson/ucla_roms_examples_input_data/raw/main/sample_grd_riv.nc",
                       file_hash="8e2f1ca3135ac7f5696d3eaec79b035a1bae15c8a34e751a7f9d925787ab3f6e")
print(my_grid)
-------------
ROMSModelGrid
-------------
Source location: https://github.com/dafyddstephenson/ucla_roms_examples_input_data/raw/main/sample_grd_riv.nc
Source file hash: 8e2f1ca3135ac7f5696d3eaec79b035a1bae15c8a34e751a7f9d925787ab3f6e
Working path: None ( does not yet exist. Call InputDataset.get() )

Creating a local copy with InputDataset.get():#

Note

Most users will not need to use the get() method: if your InputDataset is part of a ROMSSimulation instance, then C-Star will call get() automatically as part of any ROMSSimulation.setup() call.

As before, we see that Working path is None and that we should call InputDataset.get() to change this. In the case of a remote netCDF file, calling get() downloads a copy of the source file to the working directory:

[5]:
my_grid.get(local_dir = "~/Code/my_c_star/examples/input_dataset_example")
[6]:
print(my_grid)
-------------
ROMSModelGrid
-------------
Source location: https://github.com/dafyddstephenson/ucla_roms_examples_input_data/raw/main/sample_grd_riv.nc
Source file hash: 8e2f1ca3135ac7f5696d3eaec79b035a1bae15c8a34e751a7f9d925787ab3f6e
Working path: /Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/sample_grd_riv.nc (exists)
Local hash: {PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/sample_grd_riv.nc'): '8e2f1ca3135ac7f5696d3eaec79b035a1bae15c8a34e751a7f9d925787ab3f6e'}

3iii. Working with unprepared (yaml) sources#

C-Star also supports creating input datasets from plaintext instructions in .yaml format, by interfacing with the roms-tools python package. netCDF files are typically very large (often TB in total for a meaningful simulation) whereas yaml files are only a few kB, making them easier to work with when preparing or obtaining a remotely hosted simulation. However, yaml files necessitate generating the corresponding netCDF locally, a process that can have a large memory footprint and additionally requires an available copy of any datasets that ``roms-tools` requires. <https://roms-tools.readthedocs.io/en/latest/datasets.html>`__. For more information on creating datasets to export in yaml format, see the ``roms-tools` documentation <https://roms-tools.readthedocs.io/en/latest/>`__.

As we are working with plain text (rather than binary files as in the examples above) we don’t need to verify remote downloads, and so the process for using local or remote files is the same: we simply provide the location parameter, either a URL or local path.

As we are creating the dataset from scratch, depending on the type of dataset, we also need some additional information. In particular, the start_date and end_date parameters allow C-Star to tell roms-tools the dates between which the dataset is required (if any - the grid is time-invariant, for instance).

[7]:
from cstar.roms import ROMSSurfaceForcing
my_surface_forcing = ROMSSurfaceForcing(
    location="~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_yaml/roms_frc.yaml",
    start_date="2012-01-01 12:00:00",
    end_date = "2012-01-04 12:00:00"
)

print(my_surface_forcing)
------------------
ROMSSurfaceForcing
------------------
Source location: ~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_yaml/roms_frc.yaml
start_date: 2012-01-01 12:00:00
end_date: 2012-01-04 12:00:00
Working path: None ( does not yet exist. Call InputDataset.get() )

Creating a prepared copy with InputDataset.get():#

Note

Most users will not need to use the get() method: if your InputDataset is part of a ROMSSimulation instance, then C-Star will call get() automatically as part of any ROMSSimulation.setup() call.

[8]:
my_surface_forcing.get(local_dir="~/Code/my_c_star/examples/input_dataset_example/")
/Users/dafyddstephenson/miniconda3/envs/cstar_env/lib/python3.13/site-packages/roms_tools/utils.py:146: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds = xr.open_mfdataset(
[INFO] 💾 Saving roms-tools dataset created from ~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_yaml/roms_frc.yaml...
[########################################] | 100% Completed | 2.70 sms

4. Partitioning input datasets for use with ROMS#

Note

Most users will not need to use the partition() method: if your InputDataset is part of a ROMSSimulation instance, then C-Star will call partition() automatically as part of any ROMSSimulation.pre_run() call.

ROMS requires that input datasets are “partitioned” - i.e., split into several smaller files such that each processor in a parallel run works with a subset of the entire domain. To perform this action, call InputDataset.partition().

The np_xi and np_eta parameters of this method correspond to the number of processors in the xi and eta directions (roughly corresponding to East-West and North-South, depending on grid rotation):

[9]:
my_surface_forcing.partition(np_xi=3,np_eta=3)
[INFO] Partitioning /Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.nc into (3,3)

… We can see that the method executed successfully as C-Star is now additionally tracking the partitioned files :

[10]:
print(my_surface_forcing)
------------------
ROMSSurfaceForcing
------------------
Source location: ~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_yaml/roms_frc.yaml
start_date: 2012-01-01 12:00:00
end_date: 2012-01-04 12:00:00
Working path: /Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.nc (exists)
Local hash: {PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.nc'): 'a911c5ad87f0fa4d3ae9fb42798e955fbdca3d808471982f4bef65241f8b892d'}
Partitioning: ROMSPartitioning(np_xi=3, np_eta=3, files=[PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.0.nc'),
                                           PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.1.nc'),
                                              ...
                                           PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_frc_201201.8.nc')] <9 items>)

5. Additional cases#

5i. Working with pre-partitioned (netCDF) sources#

In some cases, a C-Star user may be given a selection of already partitioned files for a single InputDataset. These look like:

my_roms_grid.0.nc
my_roms_grid.1.nc
my_roms_grid.2.nc
...

in this case, the user should instantiate the ROMSInputDataset using the filename of the first partitioned file, and use the source_np_xi and source_np_eta parameters to provide information on the partitioning of the source:

[11]:
from cstar.roms import ROMSModelGrid
my_partitioned_grid = ROMSModelGrid(
    location = "~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_netcdf/partitioned/roms_grd.0.nc",
    source_np_xi = 3,
    source_np_eta =3,
)
my_partitioned_grid
[11]:
ROMSModelGrid(
location = '~/Code/my_c_star/blueprints/cstar_blueprint_roms_marbl_example/input_datasets_netcdf/partitioned/roms_grd.0.nc',
file_hash = None,
)
[12]:
my_partitioned_grid.get("~/Code/my_c_star/examples/input_dataset_example")
my_partitioned_grid.partitioning
[12]:
ROMSPartitioning(np_xi=3, np_eta=3, files=[PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_grd.0.nc'),
                                           PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_grd.1.nc'),
                                              ...
                                           PosixPath('/Users/dafyddstephenson/Code/my_c_star/examples/input_dataset_example/roms_grd.8.nc')] <9 items>)

6. Summary#

In this guide, we have considered:

  • The different subclasses of ROMSInputDataset

  • How to instantiate these different subclasses when input datasets have different sources

And optionally, for users working outside of the context of a ROMSSimulation:

  • How to create a working copy/path to a prepared, locally available copy of the dataset

  • How to partition the dataset such that it is ROMS-ready

Lastly, we covered the situation where a user inherits a pre-partitioned netCDF file to work with.