Exploring MRIOs with Pymrio

The first step when working with a new MRIO data set is to familiarize yourself with the data. This notebook shows how to use the pymrio package to explore the data. We use the test data set that is included in the pymrio package. This is a completely artificial, very small MRIO. It is not meant to be realistic, but it is useful for developing, testing and learning.

First we import the required packages:

[1]:
import pymrio

We can now load the test data set with the load_test function. We can call the MRIO whatever we want, here we use mrio.

[2]:
mrio = pymrio.load_test()

We can get some first information about the MRIO by printing it.

[3]:
print(mrio)
IO System with parameters: Z, Y, unit, population, meta, factor_inputs, emissions

This tells us what the MRIO data we just loaded contains. We find a Z and Y matrix, some unit information and two satellite accounts, factor_inputs and emissions.

To get more specific data we can ask pymrio for regions, sectors, products, etc.

[4]:
mrio.name
[4]:
'testmrio'
[5]:
mrio.get_regions()
[5]:
Index(['reg1', 'reg2', 'reg3', 'reg4', 'reg5', 'reg6'], dtype='object', name='region')
[6]:
mrio.get_sectors()
[6]:
Index(['food', 'mining', 'manufactoring', 'electricity', 'construction',
       'trade', 'transport', 'other'],
      dtype='object', name='sector')
[7]:
mrio.get_Y_categories()
[7]:
Index(['Final consumption expenditure by households',
       'Final consumption expenditure by non-profit organisations serving households (NPISH)',
       'Final consumption expenditure by government',
       'Gross fixed capital formation', 'Changes in inventories',
       'Changes in valuables', 'Export'],
      dtype='object', name='category')

The same methods can be used to explore one of the satellite accounts.

[8]:
print(mrio.emissions)
Extension Emissions with parameters: name, F, F_Y, unit
[9]:
mrio.emissions.name
[9]:
'Emissions'
[10]:
mrio.emissions.get_regions()
[10]:
Index(['reg1', 'reg2', 'reg3', 'reg4', 'reg5', 'reg6'], dtype='object', name='region')

The satellite accounts also have a special method to get index (rows) of the acccounts.

[11]:
mrio.emissions.get_rows()
[11]:
MultiIndex([('emission_type1',   'air'),
            ('emission_type2', 'water')],
           names=['stressor', 'compartment'])

Searching through the MRIO

Several methods are available to search through the whole MRIO. These generally accept regular expressions as search terms.

The most general method is ‘find’. This can be used for a quick overview where a specific term appears in the MRIO.

[12]:
mrio.find('air')
[12]:
{'emissions_index': MultiIndex([('emission_type1', 'air')],
            names=['stressor', 'compartment'])}
[13]:
mrio.find("trade")
[13]:
{'index': MultiIndex([('reg1', 'trade'),
             ('reg2', 'trade'),
             ('reg3', 'trade'),
             ('reg4', 'trade'),
             ('reg5', 'trade'),
             ('reg6', 'trade')],
            names=['region', 'sector']),
 'sectors': Index(['trade'], dtype='object', name='sector')}

Not that ‘find’ (and all other search methods) a case sensitive. Do make a case insensitive search, add the regular expression flag (?i) to the search term.

[14]:
mrio.find('value')
[14]:
{}
[15]:
mrio.find('(?i)value')
[15]:
{'factor_inputs_index': Index(['Value Added'], dtype='object', name='inputtype')}

Specific search methods: contains, match, fullmatch,

The MRIO class also contains a set of specific regular expresion search methods, mirroring the ‘contains’, ‘match’ and ‘fullmatch’ methods of the pandas DataFrame str column type. See the pandas documentation for details, in short:

-. ‘contains’ looks for a match anywhere in the string -. ‘match’ looks for a match at the beginning of the string -. ‘fullmatch’ looks for a match of the whole string

These methods are available for all index columns of the MRIO and have a similar signature:

  1. As for ‘find_all’, the search term is case sensitive. To make it case insensitive, add the regular expression flag (?i) to the search term.

  2. The search term can be passed to the keyword argument ‘find_all’ or as the first positional argument to search in all index levels.

  3. Alternativels, the search term can be passed to the keyword argument with the level name to search only in that index level.

The following examples show how to use these methods.

[16]:
mrio.contains(find_all = 'ad')
mrio.contains('ad')
[16]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])
[17]:
mrio.match('ad')
[17]:
MultiIndex([], names=['region', 'sector'])
[18]:
mrio.match('trad')
[18]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])
[19]:
mrio.fullmatch('trad')
[19]:
MultiIndex([], names=['region', 'sector'])
[20]:
mrio.fullmatch('trade')
[20]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])
[21]:
mrio.fullmatch('(?i).*AD.*')
[21]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

For the rest of the notebook, we will do the examples with the ‘contains’ method, but the same applies to the other methods.

To search only at one specific level, pass the search term to the keyword argument with the level name.

[22]:
mrio.contains(region='trade')
[22]:
MultiIndex([], names=['region', 'sector'])
[23]:
mrio.contains(sector='trade')
[23]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

And of course, the method are also available for the satellite accounts.

[24]:
mrio.emissions.contains(compartment='air')
[24]:
MultiIndex([('emission_type1', 'air')],
           names=['stressor', 'compartment'])

Passing a non-existing level to the keyword argument is silently ignored.

[25]:
mrio.factor_inputs.contains(compartment='trade')
[25]:
Index([], dtype='object')

This allows to search for terms that are only in some index levels. Locially, this is an ‘or’ search.

[26]:
mrio.factor_inputs.contains(compartment='air', inputtype="Value")
[26]:
Index(['Value Added'], dtype='object', name='inputtype')

But note, that if both levels exist, both must match (so it becomes a logical ‘and’).

[27]:
mrio.emissions.contains(stressor='emission', compartment='air')
[27]:
MultiIndex([('emission_type1', 'air')],
           names=['stressor', 'compartment'])

Search through all extensions

All three search methods are also available to loop through all extensions of the MRIO.

[28]:
mrio.extension_contains(stressor='emission', compartment='air')
[28]:
{'Factor Inputs': Index([], dtype='object'),
 'Emissions': MultiIndex([('emission_type1', 'air')],
            names=['stressor', 'compartment'])}

If only a subset of extensions should be searched, pass the extension names to the keyword argument ‘extensions’.

Generic search method for any dataframe index

Internally, the class methods ‘contains’, ‘match’ and ‘fullmatch’ all the ‘index_contains’, ‘index_match’ and ‘index_fullmatch’ methods of ioutil module. This function can be used to search through index of any pandas DataFrame.

[29]:
df = mrio.Y

Depending if a dataframe or an index is passed, the return is either the dataframe or the index.

[30]:
pymrio.index_contains(df, 'trade')
[30]:
region reg1 reg2 ... reg5 reg6
category Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government ... Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export
region sector
reg1 trade 769535.93000 16.638920 2.408807e+07 6.727345e+07 1230.218200 216.211080 0 8063.52380 12.738233 163.205380 ... 204.383170 1.684295e+00 0 49782414.00 0.224933 14.445660 16739029.00 12.145465 0.013888 0
reg2 trade 5678.26740 0.075424 2.312962e+02 6.339521e+02 35.607157 3.192694 0 385664.01000 178.501140 8160.881200 ... 358.249320 2.271962e+01 0 26592464.00 0.139745 10.623962 11775351.00 20.572534 0.005433 0
reg3 trade 2753.86080 0.111540 1.956911e+00 3.598675e+02 23.391120 0.000455 0 2072.24890 0.044811 1.242613 ... 309.984270 6.283278e+00 0 114505.89 0.630098 37.095549 31317361.00 212.707510 0.014929 0
reg4 trade 373.28393 0.009382 3.585011e-01 2.514957e-02 0.002016 0.000144 0 192.21539 0.019666 0.537107 ... 73.859706 7.199126e-02 0 40152651.00 0.255523 17.253634 14011134.00 4.052444 0.001935 0
reg5 trade 4287.40670 0.038941 7.014679e+00 1.955479e+02 6.675656 0.524015 0 3633.68750 2.536312 50.624916 ... 9177.081800 1.330591e+06 0 60992225.00 0.823823 34.208026 27870911.00 85.191511 0.008929 0
reg6 trade 4772.75750 0.113112 2.321101e+01 2.417571e+02 16.267049 1.488818 0 2031.49640 1.864492 18.787893 ... 91.040319 2.122217e+00 0 851864.04 23.371306 1966.030900 131182.13 1549.410400 0.266033 0

6 rows × 42 columns

[31]:
pymrio.index_contains(df.index, 'trade')
[31]:
MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])
[32]:
pymrio.index_fullmatch(df, region='reg[2,4]', sector='m.*')
[32]:
region reg1 reg2 ... reg5 reg6
category Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government ... Changes in inventories Changes in valuables Export Final consumption expenditure by households Final consumption expenditure by non-profit organisations serving households (NPISH) Final consumption expenditure by government Gross fixed capital formation Changes in inventories Changes in valuables Export
region sector
reg2 mining 1.653997e+02 1.817989e-05 0.334824 3.283238e+01 2.910648e+01 3.970468e-05 0 1.091126e+03 2.751312 1.544777e+01 ... 0.040217 0.000420 0 6.127299e-01 0.008212 0.013061 1.167170e+01 166.539580 0.000002 0
manufactoring 9.928459e+07 4.187143e+00 1373.370200 4.237878e+07 4.415752e+03 1.637658e+02 0 3.210316e+05 125.729110 1.603833e+07 ... 951.809210 21.641280 0 1.074192e+07 62.832488 6363.192800 1.170497e+07 1060.321500 0.145903 0
reg4 mining 1.072728e+02 9.421644e-09 0.209851 1.055704e+00 2.697312e+01 3.112643e-09 0 2.940734e+02 0.000001 1.061796e-01 ... 45.388457 0.000015 0 7.705800e+04 0.348367 2.326858 9.917552e-02 68.266815 0.000789 0
manufactoring 4.086352e+07 1.170611e+00 377.322810 3.032655e+07 2.263532e+06 4.696135e+01 0 1.510445e+07 7.598440 1.285316e+02 ... 1004.074200 1.476200 0 2.190604e+07 94.282910 6712.972800 3.167496e+07 825.256940 0.036510 0

4 rows × 42 columns