Docs
Understanding
UnderstandingHow Does EarthEmission Handle The Data Versioning Of Emission Factors?

How Does EarthEmission Handle The Data Versioning Of Emission Factors?

Emission Factor Data Versioning

In the latest major update (Beta4), earthemission are introducing a new approach to versioning of our emission factors. This change includes moving to both an API (software) and data versioning system that will allow users of the API and dataset to be able to build their implementations with the optimal mix of API stability and data recency and accuracy.

The approach to API versioning will remain as it currently is, with one major version of the API such as beta3 ,v1 etc. It will only be changed when the API makes backwards incompatible changes, such as changing required values in an endpoint or similar.

Calculation Endpoints

For endpoints that do not explicitly use selectors, such as “cloud”, “intermodal”, etc, nothing is going to change. earthemission will continue to keep these endpoints updated regularly when we judge that newer, better emission factors become available.

There is currently no way to lock these endpoints into a specific set of emission factors. This means that you are not guaranteed a calculation will use the same emission factor month over month.

This does not mean you will not be able to reproduce specific calculations if required. To enable the reproduction of calculations, earthemission can make available the ID of the emission factor used, the final activity value (e.g. of energy or distance) which was used to calculate the emissions reported, and any transformations applied to the factor during final calculation (e.g. applying a Radiative Forcing Index to a flight leg in our intermodal endpoint). This allows manual recalculation of the estimation for audit or other retrospective purposes.

Note: New alpha endpoints : There is currently an absence of the information described above in the new (alpha) energy feature and other features we are working on. This is because the complexity of the calculations performed by our new emissions calculators means that we are unable to explain the calculations adequately using the existing model. We are working on a new method to explain our calculations. If you have some requirements for calculation transparency please get in contact.

If we make any breaking changes to these endpoints, such as adding another required field, it will happen in a major API release.

earthemission will inform you when we have plans to change the emission factors used in these endpoints, via our data changelog (coming soon).

Now that we've talked about what won't be affected, let's see what will be:

earthemission's Data Changes

The database underpinning earthemission is often updated. Emission factors are changed for many reasons, such as when a source publishes errata, new data quality flags are added that apply to existing emission factors, or when a change is made to a factor's metadata such as activity_id or source_lca_activity value.

As an illustration of the kind of change to data which necessitates having a data version, consider an example in which your application calls the earthemission estimation endpoint using an activity_id, a source, a year and you have specified that you do not want to accept any factors with a data quality issue of any kind (allowed_data_quality_flags []). If earthemission later notices a data quality issue with the factor selected by your implementation with these criteria, we may add a suspicious_homogeneity flag to the emission factors, in which case your estimations would start failing as no emission factor is available that meets your search criteria.

Updates like these often require you to make changes in your application. We are introducing the concept of a data version, to ensure that you can choose when to opt-in to changes like the above.

The ways data changes

Data in the earthemission database can change in three ways.

  • New emission factors can be added
  • Existing emission factors can be modified. This could for example be if the source provides errata, or earthemission introduces a new data quality that applies to an existing factor.
  • (Rarely) an emission factor is deemed to be of such poor quality, so that we will need to mark them as deprecated. When a factor is modified, it is not deleted - rather it is replaced with an emission factor that is identical, apart from the changes. This new replacement factor also has a new ID. So whenever an emission factor is changed in any way, the ID is also changed.

Conceptually, you can think of the modification of an emission factor as two discrete steps:

  • The addition of a new (almost identical) emission factor, with a new ID
  • The removal of the old emission factor.

Data Versions

The Data version will be versioned with two numbers, such as 3.3 or 4.6. This versioning scheme mirrors the major.minor versioning scheme that software libraries often use when versioning software. We will refer to the leftmost number as the major point from now on, and the rightmost number as the minor point.

Versions

earthemission periodically makes data releases. These generally contain both modifications and additions.

When earthemission releases new data, we will:

  • Create a new minor release for every major release. This minor release will include all emission factors additions and modifications. Modifications happen, by keeping the old emission factor, but also adding a new corrected emission factor as well.
  • Create a new major release. This will include all emission factor additions and modification. It will also remove older versions of modified emission factors, so only the most up-to-date emission factors are available. This means that minor versions only ever get additions and corrections, while removals only happen in major versions.

An example When emission factor {activity_id: "power", id: "1234", data_quality_flags: []} has been modified, to e.g. have another data quality flag, the next minor version will contain two emission factors:

  • {activity_id: "power", id: "1234", data_quality_flags: []} (the original)
  • {activity_id: "power", id: "9876", data_quality_flags: ["flag"]} (the new addition)

If you upgrade to a newer minor version, and your query matches both emission factors, earthemission will pick the one from the newest data version. If your query does not match the new addition, you will continue to use the old emission factor.

The major version will only contain the newly added emission factor, and not the original:

  • {activity_id: "power", id: "9876", data_quality_flags: ["flag"]} This means that when upgrading minor versions, you will always be able to find an emission factor that you previously found - but you might be upgraded to a newer version of the emission factor if that also matches your query.

It also means that if you want to be certain that you're not using any emission factors that are not up-to-date, you should update your major version (and application code) occasionally.

Alright, now let's see how you can use this concept of data versions to decide how you want your app to behave on data changes.

Selectors

In some situations we need to be able to tell which data version you are using. This is when:

You are using the /search endpoint You are using the /estimate endpoint, or using a Selector to override emission factor selection inside a calculation endpoint, such as the cloud endpoints. In these cases, you must specify either an id or a data_version parameter. A data_version must be provided as a string value like "1.1", or "^3" - we'll talk about what they mean right after this.

If you do not provide a data_version, you will get an error that looks like this

{
    "error": "bad_request",
    "error_code": "invalid_input",
    "message": "Selector should either provide an 'id', OR a 'data_version' and 
    an 'activity_id'. It must not provide both. The latest 'data_version' is '3.3'"
}

Specifying Data Version

You can specify the data version in one of two ways:

1. Specifying a Full Version

If a data_version contains both major and minor (e.g., "8.12"), you are always looking at the same immutable view of the underlying data. This ensures consistency, and making the same request to the same version of the API will yield the same result.

This form is crucial for producing accounting reports, ensuring that calculations are made using the same underlying data.

2. Specifying a Major-Version Compatible Version

If a data_version preceded by a caret (e.g., ^8) is provided, earthemission's estimate API will provide a “version 8 compatible” set of data. This is the same as the latest minor version that belongs to the major version 8. Selecting a major-compatible version means that new emission factor updates will be received, but no emission factors will be removed until a manual major version upgrade.

  • The emission factors selected for estimates might change based on newer, more accurate factors.
  • Queries or estimates will never stop working with the same input, as no emission factor is ever modified.

Use this form when the app should have the most up-to-date and correct data, and you are okay with the result changing over time.

Upgrading Minor Versions

If specifying a full version, upgrading minor versions is low-risk:

  • Access to newer emission factors is gained.
  • No queries that used to work will stop working.
  • There is a risk of using deprecated and less-precise emission factors that have been removed in a later major version if only upgrading minor versions.
  • We recommend upgrading to the latest minor version.

Every time a major version is released, backward-compatible changes are applied as minor releases to every previous major version.

If reproducible calculations are not required, using only a major version will automatically receive minor updates.

For concrete guidance on how to upgrade major versions, see here.

Uniquely Identifying Emission Factors

Each emission factor in the earthemission database has a unique id (id). Whenever an emission factor is changed, this id is changed.

  • For emission factors that have not changed, the id will not change between data sets.
  • For each estimate earthemission performs, we will return an id, allowing you to uniquely identify the emission factor used.

When selecting an emission factor, you must use either an id or a data_version with an activity_id. Using data_version ensures selection from the exact same set of emission factors or from a set including additions but without removals. Using id ensures consistent selection of the same emission factor, even if newer factors are available or the factor has been deprecated.

Data Change Logs

When releasing a new data version, a changelog will be provided listing which factors were changed, whether they were deprecated and replaced, or simply added to or removed from the data_version in question. You can then identify if you are using any of these emission factors and determine what needs to change for migration to the latest data version.

Was this helpful?
Docs
Copyright © earthemission.com. All rights reserved.