Loading, saving and exporting data

Pymrio includes several functions for data reading and storing. This section presents the methods to use for saving and loading data already in a pymrio compatible format. For parsing raw MRIO data see the different tutorials for working with available MRIO databases.

Here, we use the included small test MRIO system to highlight the different function. The same functions are available for any MRIO loaded into pymrio. Expect, however, significantly decreased performance due to the size of real MRIO system.

import os

[7]:

import pymrio
import os

io = pymrio.load_test().calc_all()

Basic save and read

To save the full system, use:

[8]:

save_folder_full = "/tmp/testmrio/full"
io.save_all(path=save_folder_full)

[8]:

<pymrio.core.mriosystem.IOSystem at 0x7e77b95c08f0>

To read again from that folder do:

[9]:

io_read = pymrio.load_all(path=save_folder_full)

The fileio activities are stored in the included meta data history field:

[10]:

io_read.meta

[10]:

Description: test mrio for pymrio
MRIO Name: testmrio
System: pxp
Version: v1
File: /tmp/testmrio/full/metadata.json
History:
20250714 12:36:37 - FILEIO -  Added satellite account from /tmp/testmrio/full/emissions
20250714 12:36:37 - FILEIO -  Added satellite account from /tmp/testmrio/full/factor_inputs
20250714 12:36:37 - FILEIO -  Loaded IO system from /tmp/testmrio/full
20250714 12:36:37 - FILEIO -  Saved testmrio to /tmp/testmrio/full
20250714 12:36:36 - MODIFICATION -  Calculating accounts for extension emissions
20250714 12:36:36 - MODIFICATION -  Calculating accounts for extension factor_inputs
20250714 12:36:36 - MODIFICATION -  Leontief matrix L calculated
20250714 12:36:36 - MODIFICATION -  Coefficient matrix A calculated
20250714 12:36:36 - MODIFICATION -  Industry output x calculated
20250714 12:36:36 - FILEIO -  Load test_mrio from /home/konstans/proj/pymrio/pymrio/core/../mrio_models/test_mrio/mrio_data
 ... (more lines in history)

Storage format

Internally, pymrio stores data in csv format, with the ‘economic core’ data in the root and each satellite account in a subfolder. Metadata as file as a file describing the data format (‘file_parameters.json’) are included in each folder.

[11]:

os.listdir(save_folder_full)

[11]:

['L.txt',
 'population.txt',
 'metadata.json',
 'factor_inputs',
 'Z.txt',
 'A.txt',
 'Y.txt',
 'unit.txt',
 'emissions',
 'x.txt',
 'file_parameters.json']

The file format for storing the MRIO data can be switched to a binary pickle format with:

[12]:

save_folder_bin = "/tmp/testmrio/binary"
io.save_all(path=save_folder_bin, table_format="pkl")
os.listdir(save_folder_bin)

[12]:

['x.pkl',
 'metadata.json',
 'population.pkl',
 'factor_inputs',
 'Y.pkl',
 'L.pkl',
 'Z.pkl',
 'A.pkl',
 'emissions',
 'unit.pkl',
 'file_parameters.json']

This can be used to reduce the storage space required on the disk for large MRIO databases.

Archiving MRIOs databases

To archive a MRIO system after saving use pymrio.archive:

[13]:

mrio_arc = "/tmp/testmrio/archive.zip"

# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass

pymrio.archive(source=save_folder_full, archive=mrio_arc)

Data can be read directly from such an archive by:

[14]:

tt = pymrio.load_all(mrio_arc)

Currently data can not be saved directly into a zip archive. It is, however, possible to remove the source files after archiving:

[15]:

tmp_save = "/tmp/testmrio/tmp"

# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass

io.save_all(tmp_save)

print("Directories before archiving: {}".format(os.listdir("/tmp/testmrio")))
pymrio.archive(source=tmp_save, archive=mrio_arc, remove_source=True)
print("Directories after archiving: {}".format(os.listdir("/tmp/testmrio")))

Directories before archiving: ['full', 'binary', 'tmp']
Directories after archiving: ['full', 'archive.zip', 'binary']

Several MRIO databases can be stored in the same archive:

[16]:

# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass

tmp_save = "/tmp/testmrio/tmp"

io.save_all(tmp_save)
pymrio.archive(
    source=tmp_save, archive=mrio_arc, path_in_arc="version1/", remove_source=True
)
io2 = io.copy()
del io2.emissions
io2.save_all(tmp_save)
pymrio.archive(
    source=tmp_save, archive=mrio_arc, path_in_arc="version2/", remove_source=True
)

When loading from an archive which includes multiple MRIO databases, specify one with the parameter ‘path_in_arc’:

[17]:

io1_load = pymrio.load_all(mrio_arc, path_in_arc="version1/")
io2_load = pymrio.load_all(mrio_arc, path_in_arc="version2/")

print(
    f"Extensions of the loaded io1 {sorted(io1_load.get_extensions())} and of io2: {sorted(io2_load.get_extensions())}"
)

Extensions of the loaded io1 ['emissions', 'factor_inputs'] and of io2: ['factor_inputs']

The pymrio.load function can be used directly to only a specific satellite account of a MRIO database from a zip archive:

[18]:

emissions = pymrio.load(mrio_arc, path_in_arc="version1/emissions")
print(emissions)

Extension Emissions with parameters: name, F, F_Y, S, S_Y, M, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap

The archive function is a wrapper around python.zipfile module. There are, however, some differences to the defaults choosen in the original:

In contrast to zipfile.write, pymrio.archive raises an error if the data (path + filename) are identical in the zip archive. Background: the zip standard allows that files with the same name and path are stored side by side in a zip file. This becomes an issue when unpacking this files as they overwrite each other upon extraction.
The standard for the parameter ‘compression’ is set to ZIP_DEFLATED This is different from the zipfile default (ZIP_STORED) which would not give any compression. See the zipfile docs for further information. Depending on the value given for the parameter ‘compression’ additional modules might be necessary (e.g. zlib for ZIP_DEFLATED). Futher information on this can also be found in the zipfile python docs.

Storing or exporting a specific table or extension

Each extension of the MRIO system can be stored separetly with:

[19]:

save_folder_em = "/tmp/testmrio/emissions"

[20]:

io.emissions.save(path=save_folder_em)

[20]:

<pymrio.core.mriosystem.Extension at 0x7e77b95e49b0>

This can then be loaded again as separate satellite account:

[21]:

emissions = pymrio.load(save_folder_em)

[22]:

emissions

[22]:

<pymrio.core.mriosystem.Extension at 0x7e77b94cf8f0>

[23]:

emissions.D_cba

[23]:

	region	reg1								reg2		...	reg5		reg6
	sector	food	mining	manufactoring	electricity	construction	trade	transport	other	food	mining	...	transport	other	food	mining	manufactoring	electricity	construction	trade	transport	other
stressor	compartment
emission_type1	air	2.056183e+06	179423.535893	9.749300e+07	1.188759e+07	3.342906e+06	3.885884e+06	1.075027e+07	1.582152e+07	1.793338e+06	19145.604911	...	4.209505e+07	1.138661e+07	1.517235e+07	1.345318e+06	7.145075e+07	3.683167e+07	1.836696e+06	4.241568e+07	4.805409e+07	3.602298e+07
emission_type2	water	2.423103e+05	25278.192086	1.671240e+07	1.371303e+05	3.468292e+05	7.766205e+05	4.999628e+05	8.480505e+06	2.136528e+05	3733.601474	...	4.243738e+06	7.307208e+06	4.420574e+06	5.372216e+05	1.068144e+07	5.728136e+05	9.069515e+05	5.449044e+07	8.836484e+06	4.634899e+07

2 rows × 48 columns

As all data in pymrio is stored as pandas DataFrame, the full pandas stack for exporting tables is available. For example, to export a table as excel sheet use:

[24]:

io.emissions.D_cba.to_excel("/tmp/testmrio/emission_footprints.xlsx")

For further information see the pandas documentation on import/export.

Partial loading of MRIO data

Pymrio provides functionality to load only specific parts of a saved MRIO system, which can be useful for memory efficiency or when working with large databases. This is achieved using the subset parameter in the load_all function.

Loading specific matrices

You can load only specific matrices from a saved MRIO system:

[25]:

# Load only the Z matrix and D_cba data
io_partial = pymrio.load_all(save_folder_full, subset=["Z", "D_cba"])

This will load only the specified matrices. Other matrices like A, Y, L, etc. will not be loaded:

[36]:

print("Available matrices in partial load:")
print(io_partial)
print(io_partial.emissions)

Available matrices in partial load:
IO System with parameters: Z, meta, factor_inputs, emissions
Extension Emissions with parameters: name, D_cba

Loading specific extensions

You can also restrict loading to specific extensions using the subfolders parameter:

[38]:

# Load only from the emissions extension
io_emis_only = pymrio.load_all(save_folder_full, subfolders="emissions", subset=["Z", "D_cba"])
print(io_emis_only)

IO System with parameters: Z, meta, emissions

Multiple extensions can be specified as a list:

[39]:

# Load from multiple extensions (some may not exist)
io_multi_ext = pymrio.load_all(save_folder_full, subfolders=["emissions", "factor_inputs"], subset=["Z", "D_cba"])
print(io_multi_ext)

IO System with parameters: Z, meta, emissions, factor_inputs

Loading extensions without core data

To load only extension data without the core economic matrices, use include_core=False:

[43]:

# Load only extension data, no core matrices
io_ext_only = pymrio.load_all(save_folder_full, subfolders="emissions", include_core=False, subset=["D_cba"])

print("Available matrices (extensions only):")
print(io_ext_only)
print("Available extensions:")
print(list(io_ext_only.get_extensions()))
print("Available dataframes: ", list(io_ext_only.emissions.get_DataFrame()))

Available matrices (extensions only):
IO System with parameters: meta, emissions
Available extensions:
['emissions']
Available dataframes:  ['D_cba']