Convert MRIO satellite rows

Here we discuss the possibilities for converting entries in MRIO satellite accounts (Extensions). The term convert is used very broadly here, it includes the following tasks:

renaming the index names of results/extensions
adjusting the numerical values of the data, e.g. for unit conversion
finding and extracting data based on indices across a table or an mrio(-extension). This can be system based on name and potentially constrained by sector/region or any other specification.
Aggregation/Summation of satellite accounts
Characterization of stressors to impact categories (with low performance, see the Notebook on stressor characterization for a faster method).

We will cover each of these points in the examples below. First we will go through the general setup describing the structure of the bridging/mapping table. We will then cover the application of the conversion function to a single, standalone table. There we will show how to rename the index/stressor names, do unit conversions, and then do global and regional characterizations. For the case of converting a full pymrio Extension, see the converting pymrio extensions section.

For the connected topics of Aggregation of MRIOs see the Aggregation page and for Characterization of Stressors see the Notebook on stressor characterization.

Basic setup

All conversion relies on a mapping table that maps (bridges) the index/columns of the source data to the indices of the target data.

This tables requires headers (columns) corresponding to the index.names and columns.names of the source data (constraining data) as well as bridge data which specify the new target index. The later are indicated by “NewIndex__OldIndex” - the important part are the two underscore in the column name. Another (optional) column named “factor” specifies the multiplication factor for the conversion. Finally, additional columns can be used to indicate units and other information. However, for the basic conversion this information will not be used. This becomes important later for the converting Pymrio extensions.

Constraining data columns can either specify columns or index. However, any constraining data to be bridged/mapped to a new name need to be in the index of the original data.

The first example below shows the simplest case of renaming a single table. This will make the concept of the mapping table clear.

Converting standalone tables

Renaming the index entries of a single table

Assume we have a small MRIO result table with the following structure:

[1]:

import pandas as pd

import pymrio

ghg_result = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=pd.MultiIndex.from_tuples(
        [
            ("Carbon Dioxide", "Air"),
            ("Methane", "air"),
        ]
    ),
    data=[[5, 6, 7], [0.5, 0.6, 0.7]],
)
ghg_result.index.names = ["stressor", "compartment"]
ghg_result.columns.names = ["region"]

Our first task here is to rename to the chemical names of the stressors and fix the compartment spelling.

[2]:

ghg_map = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
        "factor",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air", 1.0],
        ["Methane", "[A|a]ir", "CH4", "Air", 1.0],
    ],
)
ghg_map

[2]:

	stressor	compartment	chem_stressor__stressor	compartment__compartment	factor
0	Carbon Dioxide	[A\|a]ir	CO2	Air	1.0
1	Methane	[A\|a]ir	CH4	Air	1.0

[3]:

ghg_new = pymrio.convert(ghg_result, ghg_map)
ghg_new

[3]:

	region	Region1	Region2	Region3
chem_stressor	compartment
CH4	Air	0.5	0.6	0.7
CO2	Air	5.0	6.0	7.0

Explanation: The column headers indicates that the stressor index level should be renamed from “stressor” to “chem_stressor” and the compartment index level should stay the same (NewName__OldName). The factor column is not used in this case. All renaming columns consider regular expressions, so that the spelling of the compartment can be fixed in one go.

For simple rename (and aggregation cases, see below) we can omit the factor column. Thus we obtain the same result with the following mapping table:

[4]:

ghg_map_wo_factor = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air"],
        ["Methane", "[A|a]ir", "CH4", "Air"],
    ],
)
ghg_map_wo_factor

[4]:

	stressor	compartment	chem_stressor__stressor	compartment__compartment
0	Carbon Dioxide	[A\|a]ir	CO2	Air
1	Methane	[A\|a]ir	CH4	Air

[5]:

ghg_new_wo_factor = pymrio.convert(ghg_result, ghg_map_wo_factor)
ghg_new_wo_factor

[5]:

	region	Region1	Region2	Region3
chem_stressor	compartment
CH4	Air	0.5	0.6	0.7
CO2	Air	5.0	6.0	7.0

Unit conversion

With the factor column it is easy to apply unit conversion to any result table. So, to start with the same table as above, we can apply a simple unit conversion. Assuming the data is in tonnes

[6]:

ghg_result_ton = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=pd.MultiIndex.from_tuples(
        [
            ("Carbon Dioxide", "Air"),
            ("Methane", "air"),
        ]
    ),
    data=[[5, 6, 7], [0.5, 0.6, 0.7]],
)
ghg_result_ton.index.names = ["stressor", "compartment"]
ghg_result_ton.columns.names = ["region"]
ghg_result_ton

[6]:

	region	Region1	Region2	Region3
stressor	compartment
Carbon Dioxide	Air	5.0	6.0	7.0
Methane	air	0.5	0.6	0.7

We can get the data in kg by

[7]:

ghg_map_to_kg = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "chem_stressor__stressor",
        "compartment__compartment",
        "factor",
    ],
    data=[
        ["Carbon Dioxide", "[A|a]ir", "CO2", "Air", 1000],
        ["Methane", "[A|a]ir", "CH4", "Air", 1000],
    ],
)
ghg_map_to_kg

[7]:

	stressor	compartment	chem_stressor__stressor	compartment__compartment	factor
0	Carbon Dioxide	[A\|a]ir	CO2	Air	1000
1	Methane	[A\|a]ir	CH4	Air	1000

[8]:

ghg_new_kg = pymrio.convert(ghg_result_ton, ghg_map_to_kg)
ghg_new_kg

[8]:

	region	Region1	Region2	Region3
chem_stressor	compartment
CH4	Air	500.0	600.0	700.0
CO2	Air	5000.0	6000.0	7000.0

In case of unit conversion of pymrio satellite accounts, we can also check the unit before and set the unit after conversion: TODO: unit conversion extensions, link to extension beow

Characterization

The main power of the convert function is to aggregate and characterize satellite accounts. If needed, region and sector specific characterizations can be applied.

Global characterization factors

An simple example is a conversion/aggregation based on GWP100 characterization factors. Here, we continue with the unit converted and cleanup dataframe from above:

[9]:

ghg_new_kg

[9]:

	region	Region1	Region2	Region3
chem_stressor	compartment
CH4	Air	500.0	600.0	700.0
CO2	Air	5000.0	6000.0	7000.0

We define a general purpose characterization map for GHG emissions (based on AR6 GWP100 and GWP20 factors ,with some simplifications):

[10]:

GWP_characterization = pd.DataFrame(
    columns=["chem_stressor", "GWP__chem_stressor", "factor"],
    data=[
        ["CO2", "GWP100", 1],
        ["CH4", "GWP100", 29],
        ["NHx", "GWP100", 273],
        ["CO2", "GWP20", 1],
        ["CH4", "GWP20", 80],
        ["NHx", "GWP20", 273],
        ["CO2", "GWP500", 1],
        ["CH4", "GWP500", 8],
        ["NHx", "GWP500", 130],
    ],
)
GWP_characterization

[10]:

	chem_stressor	GWP__chem_stressor	factor
0	CO2	GWP100	1
1	CH4	GWP100	29
2	NHx	GWP100	273
3	CO2	GWP20	1
4	CH4	GWP20	80
5	NHx	GWP20	273
6	CO2	GWP500	1
7	CH4	GWP500	8
8	NHx	GWP500	130

[11]:

GWP_result = pymrio.convert(ghg_new_kg, GWP_characterization)
GWP_result

[11]:

region	Region1	Region2	Region3
GWP
GWP100	19500.0	23400.0	27300.0
GWP20	45000.0	54000.0	63000.0
GWP500	9000.0	10800.0	12600.0

As we can see, GWP_characterization can include factors for stressors not actually present in the data. These are silently ignored in the conversion process. We also did not specify the compartment and assumed the same factors apply independent of the compartment (we could pass through the compartment to the new result table via passing drop_not_bridge=False to the convert function).

[12]:

GWP_result_with_comp = pymrio.convert(
    ghg_new_kg, GWP_characterization, drop_not_bridged_index=False
)
GWP_result_with_comp

[12]:

	region	Region1	Region2	Region3
GWP	compartment
GWP100	Air	19500.0	23400.0	27300.0
GWP20	Air	45000.0	54000.0	63000.0
GWP500	Air	9000.0	10800.0	12600.0

All stressors mapped to the same “impact” are first converted via the value given in the factor column and then summed up (the aggregation function can be changed via the agg_func parameter).

Regional specific characterization factors

A more complex example is the application of regional specific characterization factors (the same principle applies to sector specific factors.). For that, we assume some land use results for different regions:

[13]:

land_use_data = pd.DataFrame(
    columns=["Region1", "Region2", "Region3"],
    index=[
        "Wheat",
        "Maize",
        "Rice",
        "Pasture",
        "Forest extensive",
        "Forest intensive",
    ],
    data=[
        [3, 10, 1],
        [5, 20, 3],
        [0, 12, 34],
        [12, 34, 9],
        [32, 27, 11],
        [43, 17, 24],
    ],
)
land_use_data.index.names = ["stressor"]
land_use_data.columns.names = ["region"]
land_use_data

[13]:

region	Region1	Region2	Region3
stressor
Wheat	3	10	1
Maize	5	20	3
Rice	0	12	34
Pasture	12	34	9
Forest extensive	32	27	11
Forest intensive	43	17	24

Now we setup a pseudo characterization table for converting the land use data into biodiversity impacts. We assume, that the characterization factors vary based on land use type and region. However, the “region” information is a pure constraining column (specifying the region for which the factor applies) without any bridge column mapping it to a new name. Thus, the “region” can either be in the index or in the columns of the source data - in the given case it is in the columns.

[14]:

landuse_characterization = pd.DataFrame(
    columns=["stressor", "BioDiv__stressor", "region", "factor"],
    data=[
        ["Wheat|Maize", "BioImpact", "Region1", 3],
        ["Wheat", "BioImpact", "Region[2,3]", 4],
        ["Maize", "BioImpact", "Region[2,3]", 7],
        ["Rice", "BioImpact", "Region1", 12],
        ["Rice", "BioImpact", "Region2", 12],
        ["Rice", "BioImpact", "Region3", 12],
        ["Pasture", "BioImpact", "Region[1,2,3]", 12],
        ["Forest.*", "BioImpact", "Region1", 2],
        ["Forest.*", "BioImpact", "Region2", 3],
        ["Forest ext.*", "BioImpact", "Region3", 1],
        ["Forest int.*", "BioImpact", "Region3", 3],
    ],
)
landuse_characterization

[14]:

	stressor	BioDiv__stressor	region	factor
0	Wheat\|Maize	BioImpact	Region1	3
1	Wheat	BioImpact	Region[2,3]	4
2	Maize	BioImpact	Region[2,3]	7
3	Rice	BioImpact	Region1	12
4	Rice	BioImpact	Region2	12
5	Rice	BioImpact	Region3	12
6	Pasture	BioImpact	Region[1,2,3]	12
7	Forest.*	BioImpact	Region1	2
8	Forest.*	BioImpact	Region2	3
9	Forest ext.*	BioImpact	Region3	1
10	Forest int.*	BioImpact	Region3	3

The table shows several possibilities to specify factors which apply to several regions/stressors. All of them are based on the regular expression:

In the first data line we use the “or” operator “|” to specify that the same factor applies to Wheat and Maize.
On the next line we use the grouping capabilities of regular expressions to indicate the same factor for Region 2 and 3.
At the last four lines .* matches any number of characters. This allows to specify the same factor for both forest types or to abbreviate the naming of the stressor (last 2 lines).

The use of regular expression is optional, one can also use one line per factor. In the example above, we indicate the factor for Rice in 3 subsequent entries. This would be equivalent to ["Rice", "BioImpact", "Region[1,2,3]", 12].

With that setup we can now characterize the land use data in land_use_result.

[15]:

biodiv_characterised = pymrio.convert(land_use_data, landuse_characterization)
biodiv_characterised

[15]:

region	Region1	Region2	Region3
BioDiv
BioImpact	318	864	624

Note, that in this example the region is not in the index but in the columns. The convert function can handle both cases. The only difference is that constraints which are in the columns will never be aggregated but keep the column resolution at the output. Thus the result is equivalent to

[16]:

land_use_data_stacked = land_use_data.stack(level="region")
land_use_data_stacked

[16]:

stressor          region
Wheat             Region1     3
                  Region2    10
                  Region3     1
Maize             Region1     5
                  Region2    20
                  Region3     3
Rice              Region1     0
                  Region2    12
                  Region3    34
Pasture           Region1    12
                  Region2    34
                  Region3     9
Forest extensive  Region1    32
                  Region2    27
                  Region3    11
Forest intensive  Region1    43
                  Region2    17
                  Region3    24
dtype: int64

[17]:

biodiv_characterised_stacked = pymrio.convert(
    land_use_data_stacked, landuse_characterization, drop_not_bridged_index=False
)
biodiv_characterised_stacked.unstack(level="region")[0]

[17]:

region	Region1	Region2	Region3
BioDiv
BioImpact	318	864	624

In this case we have to specify to not drop the not bridged “region” index. We then unstack the result again, and have to select the first element ([0]), since there where not other columns left after stacking them before the characterization.

TODO: section perhaps needed somewhere? Irrespectively of the table or the mrio system, the convert function always follows the same pattern. It requires a bridge table, which contains the mapping of the indices of the source data to the indices of the target data. This bridge table has to follow a specific format, depending on the table to be converted.

Converting pymrio Extensions

The same principles as for individual tables can be used for converting full pymrio type Extensions (aka satellite accounts). In difference to the single tables, pymrio Extensions consist of several pandas DataFrames which can be converted in one go. Almost the same bridge table structure as for single tables can be used. The main additional information needed is in regard to units. Since pymrio Extensions include a unit dataframe, information about the unit names need to be included.

Extensions can be converted either one at a time, but the main power of the method lies in collecting stressor data across different extensions and converting them in one go.

We start with a simple example for converting a single extension of a pymrio MRIO system. To do so, we load the test MRIO system from pymrio.

[18]:

mrio = pymrio.load_test()

Among others, this system has an extension “emissions” with industry and final demand emissions.

[19]:

mrio.emissions.F

[19]:

	region	reg1								reg2		...	reg5		reg6
	sector	food	mining	manufactoring	electricity	construction	trade	transport	other	food	mining	...	transport	other	food	mining	manufactoring	electricity	construction	trade	transport	other
stressor	compartment
emission_type1	air	1848064.80	986448.090	23613787.00	28139100.00	2584141.80	4132656.3	21766987.0	7842090.6	1697937.30	347378.150	...	42299319	10773826.0	15777996.0	6420955.5	113172450.0	56022534.0	4861838.5	18195621	47046542.0	21632868
emission_type2	water	139250.47	22343.295	763569.18	273981.55	317396.51	1254477.8	1012999.1	2449178.0	204835.44	29463.944	...	4199841	7191006.3	4826108.1	1865625.1	12700193.0	753213.7	2699288.3	13892313	8765784.3	16782553

2 rows × 48 columns

[20]:

mrio.emissions.F_Y

[20]:

	region	reg1							reg2			...	reg5			reg6
	category	Final consumption expenditure by households	Final consumption expenditure by non-profit organisations serving households (NPISH)	Final consumption expenditure by government	Gross fixed capital formation	Changes in inventories	Changes in valuables	Export	Final consumption expenditure by households	Final consumption expenditure by non-profit organisations serving households (NPISH)	Final consumption expenditure by government	...	Changes in inventories	Changes in valuables	Export	Final consumption expenditure by households	Final consumption expenditure by non-profit organisations serving households (NPISH)	Final consumption expenditure by government	Gross fixed capital formation	Changes in inventories	Changes in valuables	Export
stressor	compartment
emission_type1	air	62335321	0	0	0	0	0	0	38566929	0	0	...	0	0	0	571278300.0	0	0	0	0	0	0
emission_type2	water	59206405	0	0	0	0	0	0	40214002	0	0	...	0	0	0	163362050.0	0	0	0	0	0	0

2 rows × 42 columns

[21]:

mrio.emissions.unit

[21]:

		unit
stressor	compartment
emission_type1	air	kg
emission_type2	water	kg

We now setup a bridge table for converting/characterizing these emission data to several other accounts.

[22]:

emis_bridge = pd.DataFrame(
    columns=[
        "stressor",
        "compartment",
        "total__stressor",
        "factor",
        "unit_orig",
        "unit_new",
    ],
    data=[
        ["emis.*", "air|water", "total_sum_tonnes", 1e-3, "kg", "t"],
        ["emission_type[1|2]", ".*", "total_sum", 1, "kg", "kg"],
        ["emission_type1", ".*", "air_emissions", 1e-3, "kg", "t"],
        ["emission_type2", ".*", "water_emissions", 1000, "kg", "g"],
        ["emission_type1", ".*", "char_emissions", 2, "kg", "kg_eq"],
        ["emission_type2", ".*", "char_emissions", 10, "kg", "kg_eq"],
    ],
)
emis_bridge

[22]:

	stressor	compartment	total__stressor	factor	unit_orig	unit_new
0	emis.*	air\|water	total_sum_tonnes	0.001	kg	t
1	emission_type[1\|2]	.*	total_sum	1.000	kg	kg
2	emission_type1	.*	air_emissions	0.001	kg	t
3	emission_type2	.*	water_emissions	1000.000	kg	g
4	emission_type1	.*	char_emissions	2.000	kg	kg_eq
5	emission_type2	.*	char_emissions	10.000	kg	kg_eq

This is a fully made up example showing various capabilities of the method. In line - 0: find all stressors with emissions (emis.*) in either air or water (air|water) compartment, rename it to “total_sum_tonnes” (total__stressor) by multiplying with a factor 0.0001 which converts the original unit “kg” to tonnes. - 1: find emission_type1 and 2, over all compartments and sum them together without any multiplication - 2: convert emissions of type 1 to air emissions in tons - 3: convert emissions of type 2 to water emissions in g - 4 and 5: two different characterization factors of 2 (type1) and 10 (type2) to convert to kg equivalent (kg_eq) of some kind and then add them together to a summary impact “char_emissions”

The new extensino can then be calculated with:

[23]:

mrio.new_ext = mrio.emissions.convert(emis_bridge, new_extension_name="new_ext")
print(mrio.new_ext)

Extension new_ext with parameters: name, F, F_Y, unit

And then the new accounts can be calculated with

[24]:

mrio.calc_all()
print(mrio.new_ext)
mrio.new_ext.unit

Extension new_ext with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap

[24]:

	unit
total
air_emissions	t
char_emissions	kg_eq
total_sum	kg
total_sum_tonnes	t
water_emissions	g

[25]:

mrio.new_ext.D_cba

# CONT: test/explain characterization across different extensions

[25]:

region	reg1								reg2		...	reg5		reg6
sector	food	mining	manufactoring	electricity	construction	trade	transport	other	food	mining	...	transport	other	food	mining	manufactoring	electricity	construction	trade	transport	other
total
air_emissions	2.056183e+03	1.794235e+02	9.749300e+04	1.188759e+04	3.342906e+03	3.885884e+03	1.075027e+04	1.582152e+04	1.793338e+03	1.914560e+01	...	4.209505e+04	1.138661e+04	1.517235e+04	1.345318e+03	7.145075e+04	3.683167e+04	1.836696e+03	4.241568e+04	4.805409e+04	3.602298e+04
char_emissions	6.535470e+06	6.116290e+05	3.621100e+08	2.514649e+07	1.015410e+07	1.553797e+07	2.650016e+07	1.164481e+08	5.723203e+06	7.562722e+04	...	1.266275e+08	9.584531e+07	7.455045e+07	8.062852e+06	2.497159e+08	7.939148e+07	1.274291e+07	6.297358e+08	1.844730e+08	5.355358e+08
total_sum	2.298494e+06	2.047017e+05	1.142054e+08	1.202472e+07	3.689735e+06	4.662504e+06	1.125023e+07	2.430203e+07	2.006991e+06	2.287921e+04	...	4.633879e+07	1.869382e+07	1.959293e+07	1.882540e+06	8.213219e+07	3.740449e+07	2.743647e+06	9.690613e+07	5.689057e+07	8.237196e+07
total_sum_tonnes	2.298494e+03	2.047017e+02	1.142054e+05	1.202472e+04	3.689735e+03	4.662504e+03	1.125023e+04	2.430203e+04	2.006991e+03	2.287921e+01	...	4.633879e+04	1.869382e+04	1.959293e+04	1.882540e+03	8.213219e+04	3.740449e+04	2.743647e+03	9.690613e+04	5.689057e+04	8.237196e+04
water_emissions	2.423103e+08	2.527819e+07	1.671240e+10	1.371303e+08	3.468292e+08	7.766205e+08	4.999628e+08	8.480505e+09	2.136528e+08	3.733601e+06	...	4.243738e+09	7.307208e+09	4.420574e+09	5.372216e+08	1.068144e+10	5.728136e+08	9.069515e+08	5.449044e+10	8.836484e+09	4.634899e+10

5 rows × 48 columns

[ ]: