about PDMX
Last update

PDMX, Poll & Direct Metadata for SDMX (Excel)

Introduction
The Excel PDMX tool supports the user in retrieving all major SDMX artefacts from the SDMX endpoint of his or her choice (dataflow, datastructure, conceptscheme, codelist and metadatastructure). See below for more background info on SDMX. Only JSON formatted content is supported. Please note that some organizations require authentication for fair use of their API (which is not supported by PDMX).

PDMX can easily be used within an Office environment as a regular Excel worksheet (with macro enabled). SDMX artefacts can be collected for two different systems, and optionally compared for any differences. The results can be filtered by agency only (or all but agency, prefix with !). The results are saved in separate sheets per artefact and system. PDMX can be used for different uses cases, viewing available cross-domain codelists to reuse, deciding related artefacts to update in case of new versions etc.

In the main sheet (named PDMX) the results are summarized (see figure 1 below for an impression) and the requests can be initiated by selecting the proper button in the Form (to launch by the 'Form'-button or F1). Retrieving the artifacts may take some time depending on the number within the selected system and your network connection. In the statusbar the status of the retrieval can be followed.

Some supporting functions can be selected as well, to unhide or hide the configuration (of the SDMX endpoint details), to initialise the worksheet, to resize the column widths of all data sheets, to save summarized results to a history sheet. The results for the different artefacts are checked for their usage as well (to detect any 'orphans' or unused artefacts, see figure 2).

The standard interrelationships of the especially the datastructure artefact are therefore of great importance. The tool can be used for all standard SDMX-API endpoints (to configure by their url, accept header and REST parameters, see below the options). You can try it out with the already configured examples, they all work.

The configuration of the SDMX endpoints can be hidden of unhidden by the button in the general form. It is only needed to enter data in the configuration cells (yellow cells) and to select the endpoint of your choice in the main sheet (in the green or blue cell).

And remember, to prevent any malfunctioning of PDMX, do not update or delete cells in the existing sheets and do not change sheet names without a good reason!

Usage
After downloading PDMX from the download-page or by this link, you must first enable the macro by trusting this file (at your own risk). You can rename the file for your own convenience. The next thing is to add and select your endpoint-parameters in the configuration-part of the main sheet (named PDMX), which can be hidden afterwards. With the Form-button (or F1) all available functions can be disclosed:

  • Get DFs: Retrieve all dataflows from system (data is collected in sheet DF1 or DF2)
  • Get DSs: Retrieve all datastructures from system (data is collected in sheet DS1 or DS2)
  • Get CSs: Retrieve all conceptschemes from system (data is collected in sheet CS1 or CS2)
  • Get CLs: Retrieve all codelists from system (data is collected in sheet CL1 or CL2)
  • Agency rep.: Produces summary of all retrieved artefacts per agency (in sheet AG1 or AG2)
  • Count data: Retrieve datasize in sheet DT1 or DT2 for all dataflows (collected in sheet DF1 or DF2)
  • Count categ.: Retrieve categories and counts only the number present
  • Count constr.: Retrieve constraints and counts only the number present
  • Compare DFs: Compare all retrieved dataflows of system1 with system2 (in sheet DFc)
  • Compare DSs: Compare all retrieved datastructures of system1 with system2 (in sheet DSc)
  • Compare CSs: Compare all retrieved conceptschemes of system1 with system2 (in sheet CSc)
  • Compare CLs: Compare all retrieved codelists of system1 with system2 (in sheet CLc)
  • Add history: Add current totals of summary in main sheet as a new row in the History sheet
  • Unhide config: Unhide (and hide) the configuration data in the main sheet
  • Resize sheets: Resize all data sheets to its initial column sizes
  • Initialise: Remove all retrieved data from the worksheet (except the configuration)
  • Cancel: Cancel function selection with the form menu
Note 1: The first 8 functions can be applied for the 'green' and 'blue' system (or system1 and system2).
Note 2: The right order in retrieving a complete and consistent collection of a system is: dataflows, datastructures, conceptschemes and codelist.
Note 3: By hitting the little button next to label System 1 or System 2, the DFs, DSs, CSs and CLs can be retrieved in one go.
Note 4: The Count data function may take a very long time depending on your connection or system endpoint. You can abort this function at any time (the data size is marked with an asterisk as incomplete). And if you start another Excel workbook, it will automatically abort (only one Excel workbook opened is allowed during data counting).

The results are displayed in the main PDMX-sheet, as well the last url used and a snippet of the reponse retrieved. The previous results are moved to the next column to the right. After retrieving the artefacts, you can examine the results in the appropriate sheets. Artefacts used are counted by their references, so artefacts not used can easily be recognized. Older versions of artefacts can be such a reason.

Datamanagement
Statistical systems can contain a lot of data, which makes their maintenance difficult. The Data count function allows the data available for each dataflow to be counted. Most SDMX systems do support some way to get an indication of the size of a dataflow (like OECD), the AvailableConstraint artifact provides this functionality. It generally contains the number of observations used by PDMX. In the event that the SDMX system does not provide this artifact (like BIS or Unicef), the 'brute force' method is selectable via the 'Data' checkbox (by which PDMX retrieves the data itself). Especially in the latter case, it will be a costly operation, at least in duration. That's why PDMX offers an incremental approach (to enable via the 'Incremental' checkbox).

The data can be requested intermittently and repeated regularly. After all, the size of the dataflows once retrieved will not change much or very often. To facilitate incremental data retrieval, PDMX also provides two configurable parameters:

  • Speed calibration: adjust this counter so that the speed of the counter value (dependant of your system) is about one second equal
  • Maximum wait: set this counter to maximum 'seconds' (defined by speed calibration), set to 0 to disable
PS As PDMX retrieves the data, a counter (starting from zero) is displayed in the row of the counted dataflow. Once the data retrieval is complete, the result (number of bytes) replaces this counter value.

Both parameters allow you to set a maximum wait in duration before skipping the dataflow and moving on to the next one.. However, you can always stop it and continue on another occasion (via alt-b). The result is displayed as http status code 200 (if ok) or any 4** or 5** value. If the retrieval fails due to a maximum wait, the fictitious result value will be -100.

Some systems do not like continuous retrieval (or 'harvesting'), so a dynamic response may be possible. An appropriate way may be to return status code 429. PDMX can automatically respond to the 429 status by waiting a minute before retrying (if enabled by the Retry after 429 parameter in the configuration).

In summary, an incremental approach does not repeat previous successful requests and retries all other data count requests. And if you want to start with a clean slate, simply disable the 'Incremental' checkbox.

Disclaimer
Please note that you use the PDMX tool at your own risk. Excelmeer is in no way liable for any damage resulting from its use.

VersionDateInformation
v0.9a11-01-1025First release made public, updated interface (to be independent of any locale), improved messages, added and checked examples.
v0.9b27-01-2025Add an incremental way of counting all available data, add maximum wait on data retrieval.
v0.9c11-02-2025Add an alternative and much quicker way of counting all available data, and improved minor things.
v0.9d13-02-2025Fix error when counting raw data and interrupting.

Contact
In case you are looking for some customization in PDMX or any other Excel development, please contact Excelmeer.


PDMX example1

Figure 1: animated example of main screen


PDMX example2

Figure 2: example of retrieved codelists

Standard SDMX-API parameters
DETAIL-parameter: This attribute specifies the desired amount of information to be returned.

  • ALLSTUBS : all artefacts should be returned as stubs, containing only identification information, as well as the artefacts' name.
  • REFERENCESTUBS: referenced artefacts should be returned as stubs, containing only identification information, as well as the artefacts' name.
  • REFERENCEPARTIAL: referenced item schemes should only include items used by the artefact to be returned. For example, a concept scheme would only contain the concepts used in a DSD, and its ISPARTIAL flag would be set to true. Likewise, if a dataflow has been constrained, then the codelists referenced by the DSD referenced by the dataflow should only contain the codes allowed by the content constraint
  • ALLCOMPLETESTUBS: all artefacts should be returned as complete stubs, containing identification information, the artefacts' name, description,annotations and isFinal information.
  • REFERENCECOMPLETESTUBS: referenced artefacts should be returned as complete stubs, containing identification information, the artefacts' name, description, annotations and isFinal information.
  • FULL (default): all available information for all artefacts should be returned.
REFERENCES-parameter: This attribute instructs the web service to return (or not) the artefacts referenced by the artefact to be returned (for example, the code lists and concepts used by the data structure definition matching the query as well as the artefacts that use the matching artefact (for example, the dataflows that use the data structure definition matching the query).
  • NONE (default): no references will be returned.
  • PARENTS: the artefacts that use the artefact matching the query
  • PARENTSANDSIBLINGS: the artefacts that use the artefact matching the query, as well as the artefacts referenced by these artefacts.
  • CHILDREN: artefacts referenced by the artefact to be returned.
  • DESCENDANTS: references of references, up to any level, will also be returned.
  • ALL: the combination of parentsandsiblings and descendants, in addition, a concrete type of resource may also be used (for example, references=codelist).
More info on the SDMX-API can be found here.

SDMX background
SDMX, which stands for Statistical Data and Metadata eXchange, is an ISO standard designed to describe statistical data and metadata, normalise their exchange, and improve their efficient sharing across statistical and similar organisations.

SDMX is sponsored by eight international organisations including the Bank for International Settlements (BIS), European Central Bank (ECB), Eurostat (Statistical Office of the European Union), International Labour Organization (ILO), International Monetary Fund (IMF), Organisation for Economic Cooperation and Development (OECD), United Nations Statistical Division (UNSD), and World Bank.

The SDMX standard provides an integrated approach to facilitating statistical data and metadata exchange, enabling interoperable implementations within and between systems concerned with the exchange, reporting and dissemination of statistical data and their related meta-information. More info on SDMX can be found at the SDMX website.