Extract data from Pymrio
This notebook shows how to extract specific data from the pymrio object for further processing in Python. For exporting/saving the data to another file format see the notebook on saving/loading/exporting data.
[39]:
import pymrio
[40]:
mrio = pymrio.load_test().calc_all()
### Basic pandas indexing of pymrio tables
Since pymrio is built on top of pandas, we can use the pandas functions to extract data from the pymrio object. For example, to access the part of the A matrix from the region 2 we can use:
[41]:
A_reg2 = mrio.A.loc["reg2", "reg2"]
A_reg2
[41]:
| sector | food | mining | manufactoring | electricity | construction | trade | transport | other |
|---|---|---|---|---|---|---|---|---|
| sector | ||||||||
| food | 0.000486 | 0.000638 | 0.000194 | 0.000005 | 0.000019 | 0.000092 | 0.000027 | 0.000026 |
| mining | 0.000006 | 0.050904 | 0.000047 | 0.000218 | 0.000203 | 0.000011 | 0.000010 | 0.000013 |
| manufactoring | 0.000488 | 0.069862 | 0.001529 | 0.000196 | 0.005915 | 0.001191 | 0.002294 | 0.000844 |
| electricity | 0.000089 | 0.050427 | 0.000137 | 0.000604 | 0.000146 | 0.000177 | 0.000280 | 0.000248 |
| construction | 0.000025 | 0.007375 | 0.000032 | 0.000109 | 0.004615 | 0.000088 | 0.000515 | 0.000422 |
| trade | 0.000251 | 0.028770 | 0.000531 | 0.000095 | 0.001640 | 0.000772 | 0.001372 | 0.000487 |
| transport | 0.000071 | 0.031839 | 0.000212 | 0.000069 | 0.000714 | 0.000579 | 0.004747 | 0.000494 |
| other | 0.000171 | 0.064935 | 0.000595 | 0.000291 | 0.002844 | 0.001897 | 0.003800 | 0.003936 |
Most tables are indexed via a multiindex, in case of the A matrix the index is a tuple of the region and the sector. To access all technical coefficients (column) data for mining from all regions we can use:
[42]:
A_mining = mrio.A.loc[:, (slice(None), "mining")]
A_mining
[42]:
| region | reg1 | reg2 | reg3 | reg4 | reg5 | reg6 | |
|---|---|---|---|---|---|---|---|
| sector | mining | mining | mining | mining | mining | mining | |
| region | sector | ||||||
| reg1 | food | 0.001179 | 0.000010 | 3.652734e-09 | 1.626677e-06 | 3.767567e-07 | 1.481621e-05 |
| mining | 0.048022 | 0.000268 | 7.486558e-07 | 6.899387e-05 | 3.628651e-05 | 8.801658e-05 | |
| manufactoring | 0.124366 | 0.017417 | 2.799765e-05 | 9.161688e-03 | 4.792741e-03 | 2.087445e-03 | |
| electricity | 0.037991 | 0.001099 | 2.170169e-07 | 1.150382e-08 | 1.444660e-05 | 8.892062e-06 | |
| construction | 0.017324 | 0.000022 | 8.884210e-08 | 1.331990e-07 | 1.129186e-05 | 7.641830e-07 | |
| trade | 0.035429 | 0.001836 | 6.170323e-07 | 3.835340e-04 | 1.071471e-03 | 5.575273e-04 | |
| transport | 0.060324 | 0.001544 | 4.060594e-06 | 5.820972e-04 | 1.278089e-03 | 6.555200e-03 | |
| other | 0.092059 | 0.005024 | 1.788858e-05 | 3.664017e-04 | 7.664473e-04 | 3.287131e-04 | |
| reg2 | food | 0.000084 | 0.000638 | 3.772203e-09 | 2.165695e-07 | 9.237430e-08 | 3.105702e-05 |
| mining | 0.000523 | 0.050904 | 5.883755e-06 | 5.472492e-05 | 2.212937e-05 | 3.108304e-05 | |
| manufactoring | 0.014563 | 0.069862 | 1.223889e-05 | 4.170428e-03 | 2.164157e-03 | 1.355740e-03 | |
| electricity | 0.000430 | 0.050427 | 1.143580e-08 | 2.061508e-08 | 3.266417e-06 | 3.535380e-07 | |
| construction | 0.000050 | 0.007375 | 1.666888e-07 | 1.910407e-07 | 1.702833e-05 | 1.421644e-06 | |
| trade | 0.000769 | 0.028770 | 1.092766e-06 | 1.792846e-04 | 1.000066e-03 | 3.103879e-04 | |
| transport | 0.003095 | 0.031839 | 1.993361e-06 | 2.990600e-05 | 1.294761e-03 | 1.368746e-03 | |
| other | 0.001584 | 0.064935 | 2.418065e-05 | 2.702272e-04 | 4.571674e-04 | 3.392091e-04 | |
| reg3 | food | 0.000037 | 0.000005 | 1.309056e-05 | 6.378397e-06 | 4.673894e-07 | 3.461944e-06 |
| mining | 0.000396 | 0.000151 | 4.337922e-04 | 3.495268e-04 | 1.564949e-04 | 4.882839e-05 | |
| manufactoring | 0.007782 | 0.003842 | 6.572771e-04 | 5.043959e-03 | 3.621657e-03 | 1.629835e-03 | |
| electricity | 0.000045 | 0.000107 | 2.495370e-04 | 1.401577e-04 | 5.464145e-07 | 1.165971e-05 | |
| construction | 0.000006 | 0.000003 | 2.786913e-04 | 2.624508e-07 | 6.239448e-06 | 1.237150e-08 | |
| trade | 0.000150 | 0.000060 | 2.485529e-04 | 4.141811e-05 | 1.517855e-03 | 2.062017e-03 | |
| transport | 0.001768 | 0.000618 | 2.091958e-04 | 4.965269e-04 | 3.777028e-04 | 2.449723e-03 | |
| other | 0.002269 | 0.007583 | 1.861799e-03 | 2.275626e-03 | 2.430304e-03 | 1.329935e-03 | |
| reg4 | food | 0.000171 | 0.000578 | 6.340547e-10 | 1.413307e-03 | 1.378513e-07 | 2.636256e-05 |
| mining | 0.000311 | 0.002467 | 2.719214e-05 | 6.138652e-02 | 1.552800e-04 | 2.699482e-04 | |
| manufactoring | 0.005207 | 0.003579 | 4.405268e-05 | 1.493263e-01 | 2.139451e-03 | 2.578051e-03 | |
| electricity | 0.000081 | 0.000013 | 1.295279e-06 | 4.967797e-02 | 1.543713e-08 | 1.669269e-05 | |
| construction | 0.000021 | 0.000008 | 6.055807e-07 | 2.337226e-03 | 7.889622e-06 | 2.033545e-06 | |
| trade | 0.000005 | 0.000016 | 1.595023e-07 | 2.826377e-02 | 9.701854e-05 | 7.025049e-04 | |
| transport | 0.000356 | 0.000048 | 9.595997e-07 | 1.586496e-02 | 3.012550e-04 | 1.991747e-03 | |
| other | 0.000873 | 0.000958 | 6.314447e-06 | 4.648443e-02 | 1.117863e-04 | 6.261181e-04 | |
| reg5 | food | 0.000062 | 0.000028 | 1.108149e-07 | 1.524618e-05 | 1.996382e-04 | 2.327498e-05 |
| mining | 0.003143 | 0.001784 | 2.499451e-06 | 1.013051e-03 | 5.149154e-02 | 9.996927e-04 | |
| manufactoring | 0.004680 | 0.002365 | 1.612636e-05 | 5.466745e-04 | 8.061924e-02 | 1.405121e-03 | |
| electricity | 0.000956 | 0.000018 | 5.676661e-08 | 2.164907e-06 | 2.545719e-02 | 1.096431e-05 | |
| construction | 0.000071 | 0.000034 | 5.621276e-07 | 3.027990e-08 | 1.765549e-02 | 3.883278e-06 | |
| trade | 0.000149 | 0.000639 | 1.895713e-06 | 5.016841e-05 | 4.981458e-02 | 1.014394e-03 | |
| transport | 0.001154 | 0.000820 | 2.183364e-06 | 2.255475e-05 | 3.801613e-02 | 2.591267e-03 | |
| other | 0.000347 | 0.000512 | 4.026853e-06 | 9.871058e-05 | 6.984674e-02 | 5.676656e-04 | |
| reg6 | food | 0.000143 | 0.000093 | 1.099390e-07 | 6.355405e-05 | 1.119791e-06 | 6.959613e-04 |
| mining | 0.007456 | 0.005567 | 3.220657e-05 | 2.052196e-03 | 9.791972e-04 | 1.512044e-01 | |
| manufactoring | 0.005841 | 0.005784 | 2.046172e-05 | 2.154067e-03 | 6.448726e-03 | 5.043697e-02 | |
| electricity | 0.000197 | 0.000168 | 1.470729e-07 | 1.102498e-05 | 3.010193e-05 | 6.787911e-03 | |
| construction | 0.000038 | 0.000023 | 5.264129e-07 | 5.950288e-07 | 3.829687e-05 | 2.041605e-03 | |
| trade | 0.000209 | 0.000184 | 6.734943e-07 | 1.334155e-04 | 1.324385e-04 | 4.701872e-03 | |
| transport | 0.002636 | 0.002308 | 4.652734e-06 | 4.705530e-05 | 2.978642e-04 | 4.032717e-02 | |
| other | 0.000505 | 0.000804 | 3.262233e-06 | 2.790342e-04 | 2.084267e-04 | 2.052462e-02 |
For further information on the pandas multiindex see the pandas documentation on advanced indexing.
Extracting data across extension tables
Pymrio includes methods for bulk extraction of data across extension tables. These can either work on a specific extension or across all extensions of the system.
Extracting from a specific extension
Here we use use the extract method available in the extension object. This expect a list of rows (index) to extract. Here we extract some rows from the emission extension table. To do so, we first define the rows (index) to extract:
[43]:
rows_to_extract = [("emission_type1", "air"), ("emission_type2", "water")]
We can now use the extract method to extract the data, either as a pandas DataFrame
[44]:
df_extract = mrio.emissions.extract(rows_to_extract, return_type="dataframe")
df_extract.keys()
[44]:
dict_keys(['F', 'F_Y', 'S', 'S_Y', 'M', 'M_down', 'D_cba', 'D_pba', 'D_imp', 'D_exp', 'unit', 'D_cba_reg', 'D_pba_reg', 'D_imp_reg', 'D_exp_reg', 'D_cba_cap', 'D_pba_cap', 'D_imp_cap', 'D_exp_cap'])
Or we extract into a new extension object:
[45]:
ext_extract = mrio.emissions.extract(rows_to_extract, return_type="extension")
str(ext_extract)
[45]:
'Extension Emissions_extracted with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap'
Note that the name of the extension object is now Emissions_extracted, based on the name of the original extension object. To use another name, just pass the name as the return_type method.
[46]:
new_extension = mrio.emissions.extract(rows_to_extract, return_type="new_extension")
str(new_extension)
[46]:
'Extension new_extension with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap'
Extracting to dataframes is also a convienient way to convert an extension object to a dictionary:
[47]:
df_all = mrio.emissions.extract(mrio.emissions.get_rows(), return_type="dfs")
df_all.keys()
# The method also allows to only extract some of the accounts:
df_some = mrio.emissions.extract(
mrio.emissions.get_rows(), dataframes=["D_cba", "D_pba"], return_type="dfs"
)
df_some.keys()
[47]:
dict_keys(['F', 'F_Y', 'S', 'S_Y', 'M', 'M_down', 'D_cba', 'D_pba', 'D_imp', 'D_exp', 'unit', 'D_cba_reg', 'D_pba_reg', 'D_imp_reg', 'D_exp_reg', 'D_cba_cap', 'D_pba_cap', 'D_imp_cap', 'D_exp_cap'])
Extracting from all extensions
We can also extract data from all extensions at once. This is done using the extension_extract method from the pymrio object. This expect a dict with keys based on the extension names and values as a list of rows (index) to extract.
Lets assume we want to extract value added and all emissions. We first define the rows (index) to extract:
[ ]:
to_extract = {
"Factor Inputs": "Value Added",
"Emissions": [("emission_type1", "air"), ("emission_type2", "water")],
}
And can then use the extension_extract method to extract the data, either as a pandas DataFrame, which returns a dictionary with the extension names as keys
[ ]:
df_extract_all = mrio.extension_extract(to_extract, return_type="dataframe")
df_extract_all.keys()
[ ]:
df_extract_all["Factor Inputs"].keys()
We can also extract into a dictionary of extension objects:
[ ]:
ext_extract_all = mrio.extension_extract(to_extract, return_type="extensions")
ext_extract_all.keys()
[ ]:
str(ext_extract_all["Factor Inputs"])
Or merge the extracted data into a new pymrio Extension object (when passing a new name as return_type):
[ ]:
ext_new = mrio.extension_extract(to_extract, return_type="new_merged_extension")
str(ext_new)
CONT: Continue with explaining, mention the work with find_all etc
Search and extract
The extract methods can also be used in combination with the search/explore methods of pymrio. This allows to search for specific rows and then extract the data.
For example, to extract all emissions from the air compartment we can use:
[ ]:
match_air = mrio.extension_match(find_all="air")
And then make a new extension object with the extracted data:
[ ]:
air_emissions = mrio.emissions.extract(match_air, return_type="extracted_air_emissions")
print(air_emissions)
For more information on the search methods see the explore notebook.
Extract data from a time series of MRIOs
[ ]: