Convert MRIO satellite rows

Here we discuss the possibilities for converting entries in MRIO satellite accounts (Extensions). The term convert is used very broadly here, it includes the following tasks:

  • renaming the index names of results/extensions

  • adjusting the numerical values of the data, e.g. for unit conversion

  • finding and extracting data based on indices across a table or an mrio(-extension). This can be system based on name and potentially constrained by sector/region or any other specification.

  • Aggregation/Summation of satellite accounts

  • Characterization of stressors to impact categories (with low performance, see the Notebook on stressor characterization for a faster method).

We will cover each of these points in the examples below. First we will go through the general setup describing the structure of the bridging/mapping table. We will then cover the application of the conversion function to a single, standalone table. There we will show how to rename the index/stressor names, do unit conversions, and then do global and regional characterizations. For the case of converting a full pymrio Extension, see the converting pymrio extensions section.

For the connected topics of Aggregation of MRIOs see the Aggregation page and for Characterization of Stressors see the Notebook on stressor characterization.

Basic setup

All conversion relies on a mapping table that maps (bridges) the index/columns of the source data to the indices of the target data.

This tables requires headers (columns) corresponding to the index.names and columns.names of the source data (constraining data) as well as bridge data which specify the new target index. The later are indicated by “NewIndex__OldIndex” - the important part are the two underscore in the column name. Another (optional) column named “factor” specifies the multiplication factor for the conversion. Finally, additional columns can be used to indicate units and other information. However, for the basic conversion this information will not be used. This becomes important later for the converting Pymrio extensions.

Constraining data columns can either specify columns or index. However, any constraining data to be bridged/mapped to a new name need to be in the index of the original data.

The first example below shows the simplest case of renaming a single table. This will make the concept of the mapping table clear.

Converting standalone tables

Renaming the index entries of a single table

Assume we have a small MRIO result table with the following structure:

[1]:
import pandas as pd

import pymrio

ghg_result = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=pd.MultiIndex.from_tuples(
        [
            ("Carbon Dioxide", "Air"),
            ("Methane", "air"),
        ]
    ),
    data=[[5, 6, 7], [0.5, 0.6, 0.7]],
)
ghg_result.index.names = ["stressor", "compartment"]
ghg_result.columns.names = ["region"]

Our first task here is to rename to the chemical names of the stressors and fix the compartment spelling.

[2]:
ghg_map = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
        "factor",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air", 1.0],
        ["Methane", "[A|a]ir", "CH4", "Air", 1.0],
    ],
)
ghg_map
[2]:
stressor compartment chem_stressor__stressor compartment__compartment factor
0 Carbon Dioxide [A|a]ir CO2 Air 1.0
1 Methane [A|a]ir CH4 Air 1.0
[3]:
ghg_new = pymrio.convert(ghg_result, ghg_map)
ghg_new
[3]:
region Region1 Region2 Region3
chem_stressor compartment
CH4 Air 0.5 0.6 0.7
CO2 Air 5.0 6.0 7.0

Explanation: The column headers indicates that the stressor index level should be renamed from “stressor” to “chem_stressor” and the compartment index level should stay the same (NewName__OldName). The factor column is not used in this case. All renaming columns consider regular expressions, so that the spelling of the compartment can be fixed in one go.

For simple rename (and aggregation cases, see below) we can omit the factor column. Thus we obtain the same result with the following mapping table:

[4]:
ghg_map_wo_factor = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air"],
        ["Methane", "[A|a]ir", "CH4", "Air"],
    ],
)
ghg_map_wo_factor
[4]:
stressor compartment chem_stressor__stressor compartment__compartment
0 Carbon Dioxide [A|a]ir CO2 Air
1 Methane [A|a]ir CH4 Air
[5]:
ghg_new_wo_factor = pymrio.convert(ghg_result, ghg_map_wo_factor)
ghg_new_wo_factor
[5]:
region Region1 Region2 Region3
chem_stressor compartment
CH4 Air 0.5 0.6 0.7
CO2 Air 5.0 6.0 7.0

Unit conversion

With the factor column it is easy to apply unit conversion to any result table. So, to start with the same table as above, we can apply a simple unit conversion. Assuming the data is in tonnes

[6]:
ghg_result_ton = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=pd.MultiIndex.from_tuples(
        [
            ("Carbon Dioxide", "Air"),
            ("Methane", "air"),
        ]
    ),
    data=[[5, 6, 7], [0.5, 0.6, 0.7]],
)
ghg_result_ton.index.names = ["stressor", "compartment"]
ghg_result_ton.columns.names = ["region"]
ghg_result_ton
[6]:
region Region1 Region2 Region3
stressor compartment
Carbon Dioxide Air 5.0 6.0 7.0
Methane air 0.5 0.6 0.7

We can get the data in kg by

[7]:
ghg_map_to_kg = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
        "factor",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air", 1000],
        ["Methane", "[A|a]ir", "CH4", "Air", 1000],
    ],
)
ghg_map_to_kg
[7]:
stressor compartment chem_stressor__stressor compartment__compartment factor
0 Carbon Dioxide [A|a]ir CO2 Air 1000
1 Methane [A|a]ir CH4 Air 1000
[8]:
ghg_new_kg = pymrio.convert(ghg_result_ton, ghg_map_to_kg)
ghg_new_kg
[8]:
region Region1 Region2 Region3
chem_stressor compartment
CH4 Air 500.0 600.0 700.0
CO2 Air 5000.0 6000.0 7000.0

In case of unit conversion of pymrio satellite accounts, we can also check the unit before and set the unit after conversion: TODO: unit conversion extensions, link to extension beow

Characterization

The main power of the convert function is to aggregate and characterize satellite accounts. If needed, region and sector specific characterizations can be applied.

Global characterization factors

An simple example is a conversion/aggregation based on GWP100 characterization factors. Here, we continue with the unit converted and cleanup dataframe from above:

[9]:
ghg_new_kg
[9]:
region Region1 Region2 Region3
chem_stressor compartment
CH4 Air 500.0 600.0 700.0
CO2 Air 5000.0 6000.0 7000.0

We define a general purpose characterization map for GHG emissions (based on AR6 GWP100 and GWP20 factors ,with some simplifications):

[10]:
GWP_characterization = pd.DataFrame(
    columns=["chem_stressor", "GWP__chem_stressor", "factor"],
    data=[
        ["CO2", "GWP100", 1],
        ["CH4", "GWP100", 29],
        ["NHx", "GWP100", 273],
        ["CO2", "GWP20", 1],
        ["CH4", "GWP20", 80],
        ["NHx", "GWP20", 273],
        ["CO2", "GWP500", 1],
        ["CH4", "GWP500", 8],
        ["NHx", "GWP500", 130],
    ],
)
GWP_characterization
[10]:
chem_stressor GWP__chem_stressor factor
0 CO2 GWP100 1
1 CH4 GWP100 29
2 NHx GWP100 273
3 CO2 GWP20 1
4 CH4 GWP20 80
5 NHx GWP20 273
6 CO2 GWP500 1
7 CH4 GWP500 8
8 NHx GWP500 130
[11]:
GWP_result = pymrio.convert(ghg_new_kg, GWP_characterization)
GWP_result
[11]:
region Region1 Region2 Region3
GWP
GWP100 19500.0 23400.0 27300.0
GWP20 45000.0 54000.0 63000.0
GWP500 9000.0 10800.0 12600.0

As we can see, GWP_characterization can include factors for stressors not actually present in the data. These are silently ignored in the conversion process. We also did not specify the compartment and assumed the same factors apply independent of the compartment (we could pass through the compartment to the new result table via passing drop_not_bridge=False to the convert function).

[12]:
GWP_result_with_comp = pymrio.convert(
    ghg_new_kg, GWP_characterization, drop_not_bridged_index=False
)
GWP_result_with_comp
[12]:
region Region1 Region2 Region3
GWP compartment
GWP100 Air 19500.0 23400.0 27300.0
GWP20 Air 45000.0 54000.0 63000.0
GWP500 Air 9000.0 10800.0 12600.0

All stressors mapped to the same “impact” are first converted via the value given in the factor column and then summed up (the aggregation function can be changed via the agg_func parameter).

Regional specific characterization factors

A more complex example is the application of regional specific characterization factors (the same principle applies to sector specific factors.). For that, we assume some land use results for different regions:

[13]:
land_use_data = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=[
        "Wheat",
        "Maize",
        "Rice",
        "Pasture",
        "Forest extensive",
        "Forest intensive",
    ],
    data=[
        [3, 10, 1],
        [5, 20, 3],
        [0, 12, 34],
        [12, 34, 9],
        [32, 27, 11],
        [43, 17, 24],
    ],
)
land_use_data.index.names = ["stressor"]
land_use_data.columns.names = ["region"]
land_use_data
[13]:
region Region1 Region2 Region3
stressor
Wheat 3 10 1
Maize 5 20 3
Rice 0 12 34
Pasture 12 34 9
Forest extensive 32 27 11
Forest intensive 43 17 24

Now we setup a pseudo characterization table for converting the land use data into biodiversity impacts. We assume, that the characterization factors vary based on land use type and region. However, the “region” information is a pure constraining column (specifying the region for which the factor applies) without any bridge column mapping it to a new name. Thus, the “region” can either be in the index or in the columns of the source data - in the given case it is in the columns.

[14]:
landuse_characterization = pd.DataFrame(
    columns=["stressor", "BioDiv__stressor", "region", "factor"],
    data=[
        ["Wheat|Maize", "BioImpact", "Region1", 3],
        ["Wheat", "BioImpact", "Region[2,3]", 4],
        ["Maize", "BioImpact", "Region[2,3]", 7],
        ["Rice", "BioImpact", "Region1", 12],
        ["Rice", "BioImpact", "Region2", 12],
        ["Rice", "BioImpact", "Region3", 12],
        ["Pasture", "BioImpact", "Region[1,2,3]", 12],
        ["Forest.*", "BioImpact", "Region1", 2],
        ["Forest.*", "BioImpact", "Region2", 3],
        ["Forest ext.*", "BioImpact", "Region3", 1],
        ["Forest int.*", "BioImpact", "Region3", 3],
    ],
)
landuse_characterization
[14]:
stressor BioDiv__stressor region factor
0 Wheat|Maize BioImpact Region1 3
1 Wheat BioImpact Region[2,3] 4
2 Maize BioImpact Region[2,3] 7
3 Rice BioImpact Region1 12
4 Rice BioImpact Region2 12
5 Rice BioImpact Region3 12
6 Pasture BioImpact Region[1,2,3] 12
7 Forest.* BioImpact Region1 2
8 Forest.* BioImpact Region2 3
9 Forest ext.* BioImpact Region3 1
10 Forest int.* BioImpact Region3 3

The table shows several possibilities to specify factors which apply to several regions/stressors. All of them are based on the regular expression:

  • In the first data line we use the “or” operator “|” to specify that the same factor applies to Wheat and Maize.

  • On the next line we use the grouping capabilities of regular expressions to indicate the same factor for Region 2 and 3.

  • At the last four lines .* matches any number of characters. This allows to specify the same factor for both forest types or to abbreviate the naming of the stressor (last 2 lines).

The use of regular expression is optional, one can also use one line per factor. In the example above, we indicate the factor for Rice in 3 subsequent entries. This would be equivalent to ["Rice", "BioImpact", "Region[1,2,3]", 12].

With that setup we can now characterize the land use data in land_use_result.

[15]:
biodiv_characterised = pymrio.convert(land_use_data, landuse_characterization)
biodiv_characterised
[15]:
region Region1 Region2 Region3
BioDiv
BioImpact 318 864 624

Note, that in this example the region is not in the index but in the columns. The convert function can handle both cases. The only difference is that constraints which are in the columns will never be aggregated but keep the column resolution at the output. Thus the result is equivalent to

[16]:
land_use_data_stacked = land_use_data.stack(level="region")
land_use_data_stacked
[16]:
stressor          region
Wheat             Region1     3
                  Region2    10
                  Region3     1
Maize             Region1     5
                  Region2    20
                  Region3     3
Rice              Region1     0
                  Region2    12
                  Region3    34
Pasture           Region1    12
                  Region2    34
                  Region3     9
Forest extensive  Region1    32
                  Region2    27
                  Region3    11
Forest intensive  Region1    43
                  Region2    17
                  Region3    24
dtype: int64
[17]:
biodiv_characterised_stacked = pymrio.convert(
    land_use_data_stacked, landuse_characterization, drop_not_bridged_index=False
)
biodiv_characterised_stacked.unstack(level="region")[0]
[17]:
region Region1 Region2 Region3
BioDiv
BioImpact 318 864 624

In this case we have to specify to not drop the not bridged “region” index. We then unstack the result again, and have to select the first element ([0]), since there where not other columns left after stacking them before the characterization.

TODO: section perhaps needed somewhere? Irrespectively of the table or the mrio system, the convert function always follows the same pattern. It requires a bridge table, which contains the mapping of the indices of the source data to the indices of the target data. This bridge table has to follow a specific format, depending on the table to be converted.

Converting pymrio Extensions

The same principles as for individual tables can be used for converting full pymrio type Extensions (aka satellite accounts). In difference to the single tables, pymrio Extensions consist of several pandas DataFrames which can be converted in one go. Almost the same bridge table structure as for single tables can be used. The main additional information needed is in regard to units. Since pymrio Extensions include a unit dataframe, information about the unit names need to be included.

Extensions can be converted either one at a time, but the main power of the method lies in collecting stressor data across different extensions and converting them in one go.

We start with a simple example for converting a single extension of a pymrio MRIO system. To do so, we load the test MRIO system from pymrio.

[18]:
mrio = pymrio.load_test()

Among others, this system has an extension “emissions” with industry and final demand emissions.

[19]:
mrio.emissions.F
[19]:
region reg1 reg2 ... reg5 reg6
sector food mining manufactoring electricity construction trade transport other food mining ... transport other food mining manufactoring electricity construction trade transport other
stressor compartment
emission_type1 air 1848064.80 986448.090 23613787.00 28139100.00 2584141.80 4132656.3 21766987.0 7842090.6 1697937.30 347378.150 ... 42299319 10773826.0 15777996.0 6420955.5 113172450.0 56022534.0 4861838.5 18195621 47046542.0 21632868
emission_type2 water 139250.47 22343.295 763569.18 273981.55 317396.51 1254477.8 1012999.1 2449178.0 204835.44 29463.944 ... 4199841 7191006.3 4826108.1 1865625.1 12700193.0 753213.7 2699288.3 13892313 8765784.3 16782553

2 rows × 48 columns

[20]:
mrio.emissions.F_Y
[20]:
region reg1 reg2 ... reg5 reg6
category Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government ... Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export
stressor compartment
emission_type1 air 62335321 0 0 0 0 0 0 38566929 0 0 ... 0 0 0 571278300.0 0 0 0 0 0 0
emission_type2 water 59206405 0 0 0 0 0 0 40214002 0 0 ... 0 0 0 163362050.0 0 0 0 0 0 0

2 rows × 42 columns

[21]:
mrio.emissions.unit
[21]:
unit
stressor compartment
emission_type1 air kg
emission_type2 water kg

We now setup a bridge table for converting/characterizing these emission data to several other accounts.

[22]:
emis_bridge = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "total__stressor",
        "factor",
        "unit_orig",
        "unit_new",
    ],
    data=[
        ["emis.*", "air|water", "total_sum_tonnes", 1e-3, "kg", "t"],
        ["emission_type[1|2]", ".*", "total_sum", 1, "kg", "kg"],
        ["emission_type1", ".*", "air_emissions", 1e-3, "kg", "t"],
        ["emission_type2", ".*", "water_emissions", 1000, "kg", "g"],
        ["emission_type1", ".*", "char_emissions", 2, "kg", "kg_eq"],
        ["emission_type2", ".*", "char_emissions", 10, "kg", "kg_eq"],
    ],
)
emis_bridge
[22]:
stressor compartment total__stressor factor unit_orig unit_new
0 emis.* air|water total_sum_tonnes 0.001 kg t
1 emission_type[1|2] .* total_sum 1.000 kg kg
2 emission_type1 .* air_emissions 0.001 kg t
3 emission_type2 .* water_emissions 1000.000 kg g
4 emission_type1 .* char_emissions 2.000 kg kg_eq
5 emission_type2 .* char_emissions 10.000 kg kg_eq

This is a fully made up example showing various capabilities of the method. In line - 0: find all stressors with emissions (emis.*) in either air or water (air|water) compartment, rename it to “total_sum_tonnes” (total__stressor) by multiplying with a factor 0.0001 which converts the original unit “kg” to tonnes. - 1: find emission_type1 and 2, over all compartments and sum them together without any multiplication - 2: convert emissions of type 1 to air emissions in tons - 3: convert emissions of type 2 to water emissions in g - 4 and 5: two different characterization factors of 2 (type1) and 10 (type2) to convert to kg equivalent (kg_eq) of some kind and then add them together to a summary impact “char_emissions”

The new extensino can then be calculated with:

[23]:
mrio.new_ext = mrio.emissions.convert(emis_bridge, new_extension_name="new_ext")
print(mrio.new_ext)
Extension new_ext with parameters: name, F, F_Y, unit

And then the new accounts can be calculated with

[24]:
mrio.calc_all()
print(mrio.new_ext)
mrio.new_ext.unit
Extension new_ext with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap
[24]:
unit
total
air_emissions t
char_emissions kg_eq
total_sum kg
total_sum_tonnes t
water_emissions g
[25]:
mrio.new_ext.D_cba


# CONT: test/explain characterization across different extensions
[25]:
region reg1 reg2 ... reg5 reg6
sector food mining manufactoring electricity construction trade transport other food mining ... transport other food mining manufactoring electricity construction trade transport other
total
air_emissions 2.056183e+03 1.794235e+02 9.749300e+04 1.188759e+04 3.342906e+03 3.885884e+03 1.075027e+04 1.582152e+04 1.793338e+03 1.914560e+01 ... 4.209505e+04 1.138661e+04 1.517235e+04 1.345318e+03 7.145075e+04 3.683167e+04 1.836696e+03 4.241568e+04 4.805409e+04 3.602298e+04
char_emissions 6.535470e+06 6.116290e+05 3.621100e+08 2.514649e+07 1.015410e+07 1.553797e+07 2.650016e+07 1.164481e+08 5.723203e+06 7.562722e+04 ... 1.266275e+08 9.584531e+07 7.455045e+07 8.062852e+06 2.497159e+08 7.939148e+07 1.274291e+07 6.297358e+08 1.844730e+08 5.355358e+08
total_sum 2.298494e+06 2.047017e+05 1.142054e+08 1.202472e+07 3.689735e+06 4.662504e+06 1.125023e+07 2.430203e+07 2.006991e+06 2.287921e+04 ... 4.633879e+07 1.869382e+07 1.959293e+07 1.882540e+06 8.213219e+07 3.740449e+07 2.743647e+06 9.690613e+07 5.689057e+07 8.237196e+07
total_sum_tonnes 2.298494e+03 2.047017e+02 1.142054e+05 1.202472e+04 3.689735e+03 4.662504e+03 1.125023e+04 2.430203e+04 2.006991e+03 2.287921e+01 ... 4.633879e+04 1.869382e+04 1.959293e+04 1.882540e+03 8.213219e+04 3.740449e+04 2.743647e+03 9.690613e+04 5.689057e+04 8.237196e+04
water_emissions 2.423103e+08 2.527819e+07 1.671240e+10 1.371303e+08 3.468292e+08 7.766205e+08 4.999628e+08 8.480505e+09 2.136528e+08 3.733601e+06 ... 4.243738e+09 7.307208e+09 4.420574e+09 5.372216e+08 1.068144e+10 5.728136e+08 9.069515e+08 5.449044e+10 8.836484e+09 4.634899e+10

5 rows × 48 columns

[ ]: