Reading Argo data by date#

Let’s use, as an example, data in the *Atlantic for the 11th November 2020. It is pre-downloaded in the ./Data folder, but you can download it from the Coriolis GDAC See here for instructions on how to download the data

First, import the libraries

import numpy as np
import netCDF4
import xarray as xr

import cartopy.crs as ccrs
import cartopy

import matplotlib as mpl
import matplotlib.cm as cm
from matplotlib import pyplot as plt
%matplotlib inline

and define a colorbar that will be useful later.

# Usefull colormaps and colorbar makers:
qcmap = mpl.colors.ListedColormap(['#000000' , '#31FC03' , '#ADFC03' , '#FCBA03' ,'#FC1C03',
                                   '#324CA8' , '#000000' , '#000000' , '#B22CC9', '#000000'])
def colorbar_qc(cmap, **kwargs):
    """Adjust colorbar ticks with discrete colors for QC flags"""
    ncolors = 10
    mappable = cm.ScalarMappable(cmap=cmap)
    mappable.set_array([])
    mappable.set_clim(-0.5, ncolors+0.5)
    colorbar = plt.colorbar(mappable, **kwargs)
    colorbar.set_ticks(np.linspace(0, ncolors, ncolors))
    colorbar.set_ticklabels(range(ncolors))
    return colorbar

Just load the data - this is, all the profiles- in the Atlantic ocean for one particular day, november 11th 2020

dayADS = xr.open_dataset('../../Data/202107-ArgoData/geo/atlantic_ocean/2020/11/20201111_prof.nc')

As we have seen xarray is very handy and read the data and the metadata of the netCDF file:

dayADS
<xarray.Dataset> Size: 21MB
Dimensions:                       (N_PROF: 186, N_PARAM: 3, N_LEVELS: 1331,
                                   N_CALIB: 3, N_HISTORY: 0)
Dimensions without coordinates: N_PROF, N_PARAM, N_LEVELS, N_CALIB, N_HISTORY
Data variables: (12/64)
    DATA_TYPE                     object 8B ...
    FORMAT_VERSION                object 8B ...
    HANDBOOK_VERSION              object 8B ...
    REFERENCE_DATE_TIME           object 8B ...
    DATE_CREATION                 object 8B ...
    DATE_UPDATE                   object 8B ...
    ...                            ...
    HISTORY_ACTION                (N_HISTORY, N_PROF) object 0B ...
    HISTORY_PARAMETER             (N_HISTORY, N_PROF) object 0B ...
    HISTORY_START_PRES            (N_HISTORY, N_PROF) float32 0B ...
    HISTORY_STOP_PRES             (N_HISTORY, N_PROF) float32 0B ...
    HISTORY_PREVIOUS_VALUE        (N_HISTORY, N_PROF) float32 0B ...
    HISTORY_QCTEST                (N_HISTORY, N_PROF) object 0B ...
Attributes:
    title:                Argo float vertical profile
    institution:          FR GDAC
    source:               Argo float
    history:              2021-07-10T08:30:50Z creation
    references:           http://www.argodatamgt.org/Documentation
    user_manual_version:  3.1
    Conventions:          Argo-3.1 CF-1.6
    featureType:          trajectoryProfile

Let use the information in the xarry to obtain the number of profiles in this day:

print(f" for this day there were {dayADS.sizes['N_PROF']} profiles")
 for this day there were 186 profiles

For each one of the profiles, which are the Argo Core missions ones, this is the Primary sampling, so we have all the meta-information to track the float that did the observations. Let’s see it for a few profiles:

for i1 in range(1,dayADS.sizes['N_PROF'],10):
    print(f"WMO {dayADS.data_vars['PLATFORM_NUMBER'].values.astype(str)[i1]}"
          f" Data Center {dayADS.data_vars['DATA_CENTRE'].values.astype(str)[i1]}"   
          f" Project name {dayADS.data_vars['PROJECT_NAME'].values.astype(str)[i1]}" )
WMO 4903277  Data Center AO Project name US ARGO PROJECT                                                 
WMO 7900506  Data Center IF Project name ARGO-BSH                                                        
WMO 6901986  Data Center IF Project name Argo Netherlands                                                
WMO 4902117  Data Center AO Project name US ARGO PROJECT                                                 
WMO 4902354  Data Center AO Project name US ARGO PROJECT                                                 
WMO 3901551  Data Center BO Project name Argo UK                                                         
WMO 6903788  Data Center IF Project name Argo Italy                                                      
WMO 3901859  Data Center IF Project name MOCCA-EU                                                        
WMO 5906005  Data Center AO Project name UW, SOCCOM, Argo equivalent                                     
WMO 3902207  Data Center AO Project name US ARGO PROJECT                                                 
WMO 4902350  Data Center AO Project name US ARGO PROJECT                                                 
WMO 4903237  Data Center AO Project name US ARGO PROJECT                                                 
WMO 3902110  Data Center IF Project name ARGO POLAND                                                     
WMO 6901281  Data Center IF Project name ARGO SPAIN                                                      
WMO 6903255  Data Center IF Project name ARGO Italy                                                      
WMO 7900524  Data Center IF Project name ARGO-BSH                                                        
WMO 4902509  Data Center ME Project name Argo Canada                                                     
WMO 1901822  Data Center AO Project name US ARGO PROJECT                                                 
WMO 4903236  Data Center AO Project name US ARGO PROJECT                                                 

The correspondence for the DATA_CENTRE code and the name is in the Reference table 4: data centres and institutions codes of the Argo user’s manual

We also have have all the geographical information in LONGITUDE and LATITUDE. But first, let’s read the data for the same day in the Pacific and Indian oceans:

dayPDS = xr.open_dataset('../../Data/202107-ArgoData/geo/pacific_ocean/2020/11/20201111_prof.nc')
dayIDS = xr.open_dataset('../../Data/202107-ArgoData/geo/indian_ocean/2020/11/20201111_prof.nc')

and now let’s plot all the global observations on that day:

fig,ax = plt.subplots(figsize=(15,10),subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_global()
ax.plot(dayADS.LONGITUDE,dayADS.LATITUDE,'ob')
ax.plot(dayPDS.LONGITUDE,dayPDS.LATITUDE,'ob')
ax.plot(dayIDS.LONGITUDE,dayIDS.LATITUDE,'ob')

ax.set_title(f"Data from {dayADS.JULD[0].values.astype('datetime64[D]')}")
ax.coastlines()
ax.add_feature(cartopy.feature.LAND)
ax.gridlines(draw_labels=True, x_inline=False, y_inline=False);

ax.grid()
../../_images/a086cb1fca26d97a06e6d23096d31cd8f417c4e0baf8e5ba0acb3374279cf76d.png

These files are includes the core argo variables temperature (TEMP), salinity(PSAL) and pressure (PRE), and therefore we can take a quick look at the data, for instance, at the most superficial level for each profile, in the whole ocean:

fig,ax = plt.subplots(figsize=(18,10),subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_global()

# data for each basin
cs=ax.scatter(dayADS.LONGITUDE,dayADS.LATITUDE,c=dayADS.TEMP[:,1],cmap="RdBu_r",vmin=-2, vmax=30, edgecolor='none')
ax.scatter(dayPDS.LONGITUDE,dayPDS.LATITUDE,c=dayPDS.TEMP[:,1],cmap="RdBu_r", vmin=-2, vmax=30, edgecolor='none')
ax.scatter(dayIDS.LONGITUDE,dayIDS.LATITUDE,c=dayIDS.TEMP[:,1],cmap="RdBu_r", vmin=-2, vmax=30, edgecolor='none')

ax.set_title(f"Most superficial TEMP data on the {dayADS.JULD[0].values.astype('datetime64[D]')}")
ax.coastlines()
ax.add_feature(cartopy.feature.LAND.with_scale('110m'))
ax.gridlines(draw_labels=True, x_inline=False, y_inline=False);

#colorbar
cbar=fig.colorbar(cs,ax=ax,extend='both',orientation='horizontal',shrink=.8,aspect=40)
cbar.set_label(dayPDS.TEMP.long_name)
../../_images/3862f9c32f9f409bae302ecca75166c7c7fb170504b554174d943606353edcfc.png

Or with a little of interpolation, we can get the Temperature a 10 dbar (roughly 10 m depth):

fig,ax = plt.subplots(figsize=(15,10),subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_global()

for filein in ['../../Data/202107-ArgoData/geo/atlantic_ocean/2020/11/20201111_prof.nc', 
               '../../Data/202107-ArgoData/geo/pacific_ocean/2020/11/20201111_prof.nc', 
               '../../Data/202107-ArgoData/geo/indian_ocean/2020/11/20201111_prof.nc']:
    #get each profile for the 3 oceans
    DS=xr.open_dataset(filein)
    lon=DS.LONGITUDE.values
    lat=DS.LATITUDE.values
    tempi= np.zeros(lon.shape[0])
    tempi.fill(np.nan)
    #interpolate at 10 dbar
    for ip in range(0,DS.LONGITUDE.shape[0]):
        tempi[ip]=np.interp(10,DS.PRES[ip,:],DS.TEMP[ip,:])
    cs=ax.scatter(lon,lat,c=tempi,cmap="RdBu_r",vmin=-2, vmax=30, edgecolor='none')

ax.set_title(f"Data from {DS.JULD[0].values.astype('datetime64[D]')}")
ax.coastlines()
ax.add_feature(cartopy.feature.LAND.with_scale('110m'))

ax.gridlines(draw_labels=True, x_inline=False, y_inline=False);
ax.grid()

cbar=fig.colorbar(cs,ax=ax,extend='both',orientation='horizontal',shrink=.8,aspect=40)
cbar.set_label(dayPDS.TEMP.long_name)
../../_images/9d38cb0056a4013af305b26a2318bc16fbd8cc22ff973987906cbde92c97cb09.png

And we can evaluate the 10 dbar temperature during the whole month of november 2020:

fig,ax = plt.subplots(figsize=(15,10),subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_global()

#For the 3 oceans
for basin in ['atlantic_ocean','pacific_ocean','indian_ocean']:
    for iday in range(1,31):
        filein=f"../../Data/202107-ArgoData/geo/{basin}/2020/11/202011{iday:02d}_prof.nc"
        DS=xr.open_dataset(filein)
        lon=DS.LONGITUDE.values
        lat=DS.LATITUDE.values
        tempi= np.zeros(lon.shape[0])
        tempi.fill(np.nan)
        for ip in range(0,lon.shape[0]):
            tempi[ip]=np.interp(10,DS.PRES[ip,:],DS.TEMP[ip,:])
        cs=ax.scatter(lon,lat,c=tempi,cmap="RdBu_r",vmin=-2, vmax=30, edgecolor='none')

ax.set_title(f"Data from {DS.JULD[0].values.astype('datetime64[M]')}")
ax.coastlines()
ax.add_feature(cartopy.feature.LAND.with_scale('110m'))

ax.gridlines(draw_labels=True, x_inline=False, y_inline=False);
ax.grid()

cbar=fig.colorbar(cs,ax=ax,extend='both',orientation='horizontal',shrink=.8,aspect=40)
cbar.set_label(dayPDS.TEMP.long_name)    
../../_images/2d23c886c14ed20eca6b48a459728cb586fe43c56d0cb274cc2421d3b1641533.png

Let’s compare with 10 years ago:

fig,ax = plt.subplots(figsize=(15,10),subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_global()

#For the 3 oceans
for basin in ['atlantic_ocean','pacific_ocean','indian_ocean']:
    for iday in range(1,31):
        filein=f"../../Data/202107-ArgoData/geo/{basin}/2010/11/201011{iday:02d}_prof.nc"
        DS=xr.open_dataset(filein)
        lon=DS.LONGITUDE.values
        lat=DS.LATITUDE.values
        tempi= np.zeros(lon.shape[0])
        tempi.fill(np.nan)
        for ip in range(0,lon.shape[0]):
            tempi[ip]=np.interp(10,DS.PRES[ip,:],DS.TEMP[ip,:])
        cs=ax.scatter(lon,lat,c=tempi,cmap="RdBu_r",vmin=-2, vmax=30, edgecolor='none')

ax.set_title(f"Data from {DS.JULD[0].values.astype('datetime64[M]')}")
ax.coastlines()
ax.add_feature(cartopy.feature.LAND.with_scale('110m'))

ax.gridlines(draw_labels=True, x_inline=False, y_inline=False);
ax.grid()

cbar=fig.colorbar(cs,ax=ax,extend='both',orientation='horizontal',shrink=.8,aspect=40)
cbar.set_label(dayPDS.TEMP.long_name)    
../../_images/15053b471b5c4f381aa7f00c30f9ab3f8ab2b8fa1665065532409c2c36fa80ab.png

and it is possible to get the WMO of all the platforms that measured during this month, together with its data acquisition center. Hence, we can download the netCDF files for each cycle if necessary:

WMOs=np.array([])
DACs=np.array([])
#read all the basins
for basin in ['atlantic_ocean','pacific_ocean','indian_ocean']:
    for iday in range(1,31):
        filein=f"../../Data/202107-ArgoData/geo/{basin}/2020/11/202011{iday:02d}_prof.nc"
        DS=xr.open_dataset(filein)
        #look for the WMO and DAC for each float
        DACs=np.append(DACs,DS.DATA_CENTRE.astype(str).values)
        WMOs=np.append(WMOs,DS.PLATFORM_NUMBER.astype(int).values)

#Keep just the unique set of WMOs
WMOs, indices = np.unique(WMOs, return_index=True)
DACs=DACs[indices]
print(f"During november 2020 {WMOs.shape[0]} Argo floats where active:")
for ip in range(0,WMOs.shape[0],500):
    print(f"{ip} WMO {WMOs[ip]} DAC {DACs[ip]}")
During november 2020 3862 Argo floats where active:
0 WMO 1901302.0 DAC BO
500 WMO 2902785.0 DAC HZ
1000 WMO 3901906.0 DAC BO
1500 WMO 4902932.0 DAC AO
2000 WMO 5904353.0 DAC AO
2500 WMO 5905235.0 DAC AO
3000 WMO 5906115.0 DAC AO
3500 WMO 6902938.0 DAC IF

we can also plot all the vertical profiles of salinity (‘PSAL’) for the november 11th 2020:

fig,ax = plt.subplots(figsize=(15,10))
ax.plot(dayADS.PSAL.transpose(),dayADS.PRES.transpose(),'o-')
ax.set_ylim(0,2000)
ax.invert_yaxis()
ax.set_xlabel(f"{dayADS.PSAL.long_name}")
ax.set_ylabel(f"{dayADS.PRES.long_name}");
../../_images/4ec02d6cd0b8e26cdeb6917a5df41749bbf4281675ab32fd0350d17ac9cc6106.png

it is obvious, that there are some incorrect, or at least suspicious data.

In the netCDF file, there are a lot of the _QC variables. In the case of PSAL_QC, it changes for some profiles

fig, ax = plt.subplots(figsize=(20,10))
sc = ax.scatter(dayADS.PSAL, dayADS.PRES, c=dayADS.PSAL_QC, vmin=0, vmax=9, cmap=qcmap)
colorbar_qc(qcmap, ax=ax)
ax.grid()
ax.set_ylim(0,2000)
ax.invert_yaxis()
ax.set_xlabel(f"{dayADS.PSAL.long_name}")
ax.set_ylabel('Pressure')
ax.set_title(f"PSAL {dayADS.PSAL_QC.long_name}");
../../_images/d15861eb99f394306fa609b956333d4796b8e21bf066052186bd8511d7981e80.png

In sections Real Time quality control data and Delayed mode data you may learn everything about the use of this quality control flags, so you can choose the best data available!