Using Argo Data

Using Argo Data#

As indicated in the introduction, the second part of the Argo Online School is hands-on, and it is mainly based on Jupyter Notebooks and a few short videos to teach, how to acces the data Argo data and use it.

The Argo Online School 301 - Using Argo Data. Introduction: Hands-on!

The hands-on component of the Argo Online School#

You can access the hands-on content as here, in a web-page, built using JupyterBook, or download the individual Jupyter Notebooks and use them in your local machine.

The hands-on component of the Argo Online School was developed using JupyterLab, since it contains a complete environment for interactive scientific computing which runs in your web browser. Jupyter is an open-source python project, and as a very useful first approach to python and JupyterLab, you can use An Introduction to Earth and Environmental Data Science and Research computing in earth Sciences developed by Ryan Abernathey and Kerry Key.

Run the Jupyter notebooks locally#

In case that you decide to download and run the python Jupyter Notebooks in your local machine, you should create a python environment. Since using some of the packages may give problems due to compatibility issues between conda-forge packages and packages contained in the default conda channels, we recommend to set up channel_priority: strict and give priority to the conda-forge channel over the default channels when installing stuff. There are two ways of doing it. Either you always specify conda install -c conda-forge or you create a .condarc file in your home with this content:

channels:
  - conda-forge
  - defaults
channel_priority: strict

An environment should either use conda-forge or not, from creation to destruction. However, do not mix and match. If you created it without using the conda-forge channel, then do not add it to the mix halfway. In practice, we always create an environment with conda-forge unless in very specific cases where we found incompatibilities.

To create and activate the AoS environment, that already includes the last stable version of argopy, the python library for Argo data, you should:

conda env create -n AOS -f environment.yml 
conda activate AOS

where environment.yml is in the GitHub repository of the Argo online School

If by any chance, a library is not included in the environment.yml file, you should install it using, for instance:

conda install netcdf4
conda install xarray
conda install seawater

or all of them at the same time:

conda create --name AOS python=3.13.7 ipython jupyterlab jupyter numpy pandas netCDF4 xarray matplotlib seawater cartopy argopy gsw

Now, you only have to open you preferred jupyter interface. In the AOS environment it is included jupyterLab, an interface that offers a very interactive interface that includes notebooks, consoles, terminals, CSV editors, markdown editors, interactive maps, and more. To run it, just type

jupyter-lab

For futher information about Managing Python Environments we recommend the corresponding section in An Introduction to Earth and Environmental Data Science

Data used in the Argo Online School#

If you download the notebooks you should create a ./Data* folder to include the data files used for the examples. The files in the Data folder can be downloaded directly from its source using wget, for instance:

for the Daily NOAA OI SST V2 High-Resolution Dataset for 2020:

wget --directory-prefix=Data ftp://ftp2.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/sst.day.mean.2020.nc

for the data from floats 6901254:

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo/dac/coriolis/6901254

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo/dac/coriolis/6901472

or for the data in all the oceans for the November 2010 and 2020:

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/indian_ocean/2010/11

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/atlantic_ocean/2010/11

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/pacific_ocean/2010/11

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/indian_ocean/2020/11

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/atlantic_ocean/2020/11

wget --no-host --cut-dirs=4 --directory-prefix=Data --recursive ftp://ftp.ifremer.fr/ifremer/argo//geo/pacific_ocean/2020/11

Additionally, you can download a snapshot of the Argo Global Data Assembly Centre (Argo GDAC) here. Although it is a very large file (>50Gb) the advantages is that the dataset is always available with the same content and each month have a different DOI, for instance DOI 10.17882/42182 for july 2021, and therefore is very convinient for reproductible science.

However we recommend to download our version of the data, that is a subsample of the snapshot from july 2021, to use exactly the same dataset and therefore to be able to reproduce precisely the notebooks here. You can download it from here Since it is a large file (>1Gb) probably it will be downloaded in several zip parts and you may have to combine them, for instance doing:

unzip '*.zip' -d combined

Once you have downloaded the data, the ./Data folder should look like:

The 202107 indicates that the Argo data was obtained from the july 2021 snapshoot. That is the structure and name that the snaspshoot would have once downloaded and uncompressed.

Using data in AWS#

If you do not want to download the data, it is possible to access the Argo data from the Amazon Web Services, AWS. Although it may be slower, it is possible, please keep reading.