{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading, saving and exporting data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pymrio includes several functions for data reading and storing. This section presents the methods to use for saving and loading data already in a pymrio compatible format. For parsing raw MRIO data see the different tutorials for [working with available MRIO databases](../handling.rst)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we use the included small test MRIO system to highlight the different function. The same functions are available for any MRIO loaded into pymrio. Expect, however, significantly decreased performance due to the size of real MRIO system.\n", "\n", "import os" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import pymrio\n", "import os\n", "\n", "io = pymrio.load_test().calc_all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic save and read" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To save the full system, use:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "save_folder_full = \"/tmp/testmrio/full\"\n", "io.save_all(path=save_folder_full)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To read again from that folder do:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "io_read = pymrio.load_all(path=save_folder_full)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fileio activities are stored in the included meta data history field:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Description: test mrio for pymrio\n", "MRIO Name: testmrio\n", "System: pxp\n", "Version: v1\n", "File: /tmp/testmrio/full/metadata.json\n", "History:\n", "20250714 12:36:37 - FILEIO - Added satellite account from /tmp/testmrio/full/emissions\n", "20250714 12:36:37 - FILEIO - Added satellite account from /tmp/testmrio/full/factor_inputs\n", "20250714 12:36:37 - FILEIO - Loaded IO system from /tmp/testmrio/full\n", "20250714 12:36:37 - FILEIO - Saved testmrio to /tmp/testmrio/full\n", "20250714 12:36:36 - MODIFICATION - Calculating accounts for extension emissions\n", "20250714 12:36:36 - MODIFICATION - Calculating accounts for extension factor_inputs\n", "20250714 12:36:36 - MODIFICATION - Leontief matrix L calculated\n", "20250714 12:36:36 - MODIFICATION - Coefficient matrix A calculated\n", "20250714 12:36:36 - MODIFICATION - Industry output x calculated\n", "20250714 12:36:36 - FILEIO - Load test_mrio from /home/konstans/proj/pymrio/pymrio/core/../mrio_models/test_mrio/mrio_data\n", " ... (more lines in history)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "io_read.meta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Storage format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally, pymrio stores data in csv format, with the 'economic core' data in the root and each satellite account in a subfolder. Metadata as file as a file describing the data format ('file_parameters.json') are included in each folder." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['L.txt',\n", " 'population.txt',\n", " 'metadata.json',\n", " 'factor_inputs',\n", " 'Z.txt',\n", " 'A.txt',\n", " 'Y.txt',\n", " 'unit.txt',\n", " 'emissions',\n", " 'x.txt',\n", " 'file_parameters.json']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "os.listdir(save_folder_full)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file format for storing the MRIO data can be switched to a binary pickle format with:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['x.pkl',\n", " 'metadata.json',\n", " 'population.pkl',\n", " 'factor_inputs',\n", " 'Y.pkl',\n", " 'L.pkl',\n", " 'Z.pkl',\n", " 'A.pkl',\n", " 'emissions',\n", " 'unit.pkl',\n", " 'file_parameters.json']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "save_folder_bin = \"/tmp/testmrio/binary\"\n", "io.save_all(path=save_folder_bin, table_format=\"pkl\")\n", "os.listdir(save_folder_bin)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can be used to reduce the storage space required on the disk for large MRIO databases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Archiving MRIOs databases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To archive a MRIO system after saving use pymrio.archive:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "mrio_arc = \"/tmp/testmrio/archive.zip\"\n", "\n", "# Remove a potentially existing archive from before\n", "try:\n", " os.remove(mrio_arc)\n", "except FileNotFoundError:\n", " pass\n", "\n", "pymrio.archive(source=save_folder_full, archive=mrio_arc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data can be read directly from such an archive by:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "tt = pymrio.load_all(mrio_arc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently data can not be saved directly into a zip archive.\n", "It is, however, possible to remove the source files after archiving:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Directories before archiving: ['full', 'binary', 'tmp']\n", "Directories after archiving: ['full', 'archive.zip', 'binary']\n" ] } ], "source": [ "tmp_save = \"/tmp/testmrio/tmp\"\n", "\n", "# Remove a potentially existing archive from before\n", "try:\n", " os.remove(mrio_arc)\n", "except FileNotFoundError:\n", " pass\n", "\n", "io.save_all(tmp_save)\n", "\n", "print(\"Directories before archiving: {}\".format(os.listdir(\"/tmp/testmrio\")))\n", "pymrio.archive(source=tmp_save, archive=mrio_arc, remove_source=True)\n", "print(\"Directories after archiving: {}\".format(os.listdir(\"/tmp/testmrio\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Several MRIO databases can be stored in the same archive:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Remove a potentially existing archive from before\n", "try:\n", " os.remove(mrio_arc)\n", "except FileNotFoundError:\n", " pass\n", "\n", "tmp_save = \"/tmp/testmrio/tmp\"\n", "\n", "io.save_all(tmp_save)\n", "pymrio.archive(\n", " source=tmp_save, archive=mrio_arc, path_in_arc=\"version1/\", remove_source=True\n", ")\n", "io2 = io.copy()\n", "del io2.emissions\n", "io2.save_all(tmp_save)\n", "pymrio.archive(\n", " source=tmp_save, archive=mrio_arc, path_in_arc=\"version2/\", remove_source=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When loading from an archive which includes multiple MRIO databases, specify\n", "one with the parameter 'path_in_arc':" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extensions of the loaded io1 ['emissions', 'factor_inputs'] and of io2: ['factor_inputs']\n" ] } ], "source": [ "io1_load = pymrio.load_all(mrio_arc, path_in_arc=\"version1/\")\n", "io2_load = pymrio.load_all(mrio_arc, path_in_arc=\"version2/\")\n", "\n", "print(\n", " f\"Extensions of the loaded io1 {sorted(io1_load.get_extensions())} and of io2: {sorted(io2_load.get_extensions())}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pymrio.load function can be used directly to only a specific satellite account\n", "of a MRIO database from a zip archive:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extension Emissions with parameters: name, F, F_Y, S, S_Y, M, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap\n" ] } ], "source": [ "emissions = pymrio.load(mrio_arc, path_in_arc=\"version1/emissions\")\n", "print(emissions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The archive function is a wrapper around python.zipfile module.\n", "There are, however, some differences to the defaults choosen in the original:\n", "\n", "- In contrast to [zipfile.write](https://docs.python.org/3/library/zipfile.html),\n", " pymrio.archive raises an\n", " error if the data (path + filename) are identical in the zip archive.\n", " Background: the zip standard allows that files with the same name and path\n", " are stored side by side in a zip file. This becomes an issue when unpacking\n", " this files as they overwrite each other upon extraction.\n", "\n", "- The standard for the parameter 'compression' is set to ZIP_DEFLATED\n", " This is different from the zipfile default (ZIP_STORED) which would\n", " not give any compression.\n", " See the [zipfile docs](https://docs.python.org/3/library/zipfile.html#zipfile-objects)\n", " for further information.\n", " Depending on the value given for the parameter 'compression'\n", " additional modules might be necessary (e.g. zlib for ZIP_DEFLATED).\n", " Futher information on this can also be found in the zipfile python docs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Storing or exporting a specific table or extension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each extension of the MRIO system can be stored separetly with:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "save_folder_em = \"/tmp/testmrio/emissions\"" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "io.emissions.save(path=save_folder_em)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can then be loaded again as separate satellite account:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "emissions = pymrio.load(save_folder_em)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "emissions" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regionreg1reg2...reg5reg6
sectorfoodminingmanufactoringelectricityconstructiontradetransportotherfoodmining...transportotherfoodminingmanufactoringelectricityconstructiontradetransportother
stressorcompartment
emission_type1air2.056183e+06179423.5358939.749300e+071.188759e+073.342906e+063.885884e+061.075027e+071.582152e+071.793338e+0619145.604911...4.209505e+071.138661e+071.517235e+071.345318e+067.145075e+073.683167e+071.836696e+064.241568e+074.805409e+073.602298e+07
emission_type2water2.423103e+0525278.1920861.671240e+071.371303e+053.468292e+057.766205e+054.999628e+058.480505e+062.136528e+053733.601474...4.243738e+067.307208e+064.420574e+065.372216e+051.068144e+075.728136e+059.069515e+055.449044e+078.836484e+064.634899e+07
\n", "

2 rows × 48 columns

\n", "
" ], "text/plain": [ "region reg1 \\\n", "sector food mining manufactoring \n", "stressor compartment \n", "emission_type1 air 2.056183e+06 179423.535893 9.749300e+07 \n", "emission_type2 water 2.423103e+05 25278.192086 1.671240e+07 \n", "\n", "region \\\n", "sector electricity construction trade \n", "stressor compartment \n", "emission_type1 air 1.188759e+07 3.342906e+06 3.885884e+06 \n", "emission_type2 water 1.371303e+05 3.468292e+05 7.766205e+05 \n", "\n", "region reg2 \\\n", "sector transport other food \n", "stressor compartment \n", "emission_type1 air 1.075027e+07 1.582152e+07 1.793338e+06 \n", "emission_type2 water 4.999628e+05 8.480505e+06 2.136528e+05 \n", "\n", "region ... reg5 \\\n", "sector mining ... transport other \n", "stressor compartment ... \n", "emission_type1 air 19145.604911 ... 4.209505e+07 1.138661e+07 \n", "emission_type2 water 3733.601474 ... 4.243738e+06 7.307208e+06 \n", "\n", "region reg6 \\\n", "sector food mining manufactoring \n", "stressor compartment \n", "emission_type1 air 1.517235e+07 1.345318e+06 7.145075e+07 \n", "emission_type2 water 4.420574e+06 5.372216e+05 1.068144e+07 \n", "\n", "region \\\n", "sector electricity construction trade \n", "stressor compartment \n", "emission_type1 air 3.683167e+07 1.836696e+06 4.241568e+07 \n", "emission_type2 water 5.728136e+05 9.069515e+05 5.449044e+07 \n", "\n", "region \n", "sector transport other \n", "stressor compartment \n", "emission_type1 air 4.805409e+07 3.602298e+07 \n", "emission_type2 water 8.836484e+06 4.634899e+07 \n", "\n", "[2 rows x 48 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "emissions.D_cba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As all data in pymrio is stored as [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html), the full pandas stack for exporting tables is available. For example, to export a table as excel sheet use:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "io.emissions.D_cba.to_excel(\"/tmp/testmrio/emission_footprints.xlsx\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For further information see the pandas [documentation on import/export](https://pandas.pydata.org/pandas-docs/stable/io.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Partial loading of MRIO data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pymrio provides functionality to load only specific parts of a saved MRIO system, which can be useful for memory efficiency or when working with large databases. This is achieved using the `subset` parameter in the `load_all` function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading specific matrices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can load only specific matrices from a saved MRIO system:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Load only the Z matrix and D_cba data\n", "io_partial = pymrio.load_all(save_folder_full, subset=[\"Z\", \"D_cba\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will load only the specified matrices. Other matrices like A, Y, L, etc. will not be loaded:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available matrices in partial load:\n", "IO System with parameters: Z, meta, factor_inputs, emissions\n", "Extension Emissions with parameters: name, D_cba\n" ] } ], "source": [ "print(\"Available matrices in partial load:\")\n", "print(io_partial)\n", "print(io_partial.emissions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading specific extensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also restrict loading to specific extensions using the `subfolders` parameter:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "IO System with parameters: Z, meta, emissions\n" ] } ], "source": [ "# Load only from the emissions extension\n", "io_emis_only = pymrio.load_all(save_folder_full, subfolders=\"emissions\", subset=[\"Z\", \"D_cba\"])\n", "print(io_emis_only)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multiple extensions can be specified as a list:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "IO System with parameters: Z, meta, emissions, factor_inputs\n" ] } ], "source": [ "# Load from multiple extensions (some may not exist)\n", "io_multi_ext = pymrio.load_all(save_folder_full, subfolders=[\"emissions\", \"factor_inputs\"], subset=[\"Z\", \"D_cba\"])\n", "print(io_multi_ext)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading extensions without core data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load only extension data without the core economic matrices, use `include_core=False`:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available matrices (extensions only):\n", "IO System with parameters: meta, emissions\n", "Available extensions:\n", "['emissions']\n", "Available dataframes: ['D_cba']\n" ] } ], "source": [ "# Load only extension data, no core matrices\n", "io_ext_only = pymrio.load_all(save_folder_full, subfolders=\"emissions\", include_core=False, subset=[\"D_cba\"])\n", "\n", "print(\"Available matrices (extensions only):\")\n", "print(io_ext_only)\n", "print(\"Available extensions:\")\n", "print(list(io_ext_only.get_extensions()))\n", "print(\"Available dataframes: \", list(io_ext_only.emissions.get_DataFrame()))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 4 }