Nova V3 API

There has been a lot of discussion on the openstack-dev mailing list around the future of the Nova API. This document tries to cover the problems with the v2 API, what has been developed with the v3 API to address these and how we can resolve some of the issues with long term maintenance of two APIs. As well as strategies for coping with backwards incompatible changes into the future whilst still minimising maintenance overhead.

Note that this document does not attempt to compare the proposed V2 API only proposal against proceeding with V3 API.

  1. Problems with the V2 API
    1. User facing problems
      1. Incorrect success return codes
      2. Poor input validation
      3. Poor error handling
      4. API inconsistencies
    2. Developer and Operator issues
      1. Immature plugin loader
      2. Inconsistent way of implementing API features
      3. Lack of versioning
      4. Poor input validation
  2. Status of the v3 API
  3. v3 API improvements
    1. User facing improvements and changes
      1. JSON schema input validation
      2. Fixes incorrect success return codes
      3. API plugins are explicitly versioned
      4. Removes project id from the url
      5. Fixes API inconsistencies
      6. Reduction in API extensions
    2. Developers and Operators
      1. Improved framework for API plugins
      2. Impact of JSON Schema input validation
      3. Explicit declaration of what errors may be returned from a method
      4. Ability to whitelist/blacklist plugins
      5. Policy checks at API level
      6. More fine-grained extensions
      7. Improved unittests
  4. Long term dual maintenance costs
    1. Measuring the dual maintenance cost and short term solutions
    2. Long term option 1 (v2.1 based on the v3 codebase)
    3. Long term option 2 (Proxy)
    4. Handling API changes in the future

Problems with the v2 API

The Nova v2 API is essentially the first version of the Nova API and has grown fairly organically over time. Nova has grown very quickly over a short period of time, and although in recent times we have started to raise the bar when it comes to reviewing changes to the API, historically adding features quickly has taken priority. Both the API itself, and especially how to add features to the API, has been inadequately documented. Often API features have been written by cut and pasting existing code and modifying it to suit, and as a result a lot of bugs and inconsistencies have arisen, depending on exactly what API features another was based on.

Work first started in Grizzly to address some of these problems, but it was quickly realised that many of the problems could not be fixed without backwards incompatible changes. A larger range of fixes was proposed and discussed at a couple of the Havana design summit sessions, and patches started merging soon after. Some of the issues are:

User facing problems

Developer and Operator issues

There are a number of issues with the v2 API implementation which lead to higher development and maintenance costs for the Nova API. Together with the primarily user facing issues this results in a code base which is a lot more fragile than we would otherwise like to have.

Status of the v3 API

Development of the v3 API started with some prototype code for the new API plugin framework at the Havana summit. Most of the API code was ported and merged during Havana, with a focus on testing, API input validation and more generic cleanups in Icehouse. Most of the code discussed to be done at the Icehouse summit has been completed and submitted for review, though there is still some that has not yet merged.

The tempest tests have been adjusted for the cleaned up v3 API and the v3 API tempest tests are part of the gate that all changes have to pass.

python-novaclient support has been implemented and merged.

As a very rough measure of the amount of effort required to get to where we for just changes in the Nova repository (so it doesn't count tempest or python-novaclient development), there are around 400+ V3 API related patches merged over both Havana and Icehouse. This does not include the extra "part 1" patches which were a result of making it easier to review initial V3 code, but would include some unrelated bug fixes. There are approximately 30 or so patches which were in the review queue until they were frozen pending the Nova API discussions.

There is no XML support in the v3 API as the v2 API support for XML has been marked deprecated. There is also no proxying of information to neutron, glance or cinder which can be instead queried from those services directly. It is expected that instead client libraries will handle that. python-novaclient does this where necessary and leaves the rest to the various service clients. Eventually it is expected that the openstack client will handle it as a unified interface, cutting back further on duplicated code. Retaining proxying support in the Nova api in the very long term means duplicated code between the Nova API and openstack clients which has extra maintenance overhead.

There were two reasons behind delaying the release of the v3 API in Icehouse. The first was that development of nova-network was unfrozen early in I-3. Originally nova-network was deprecated and so nova-network support was explicitly being removed from the v3 API. As a result of nova-network now becoming required, the nova-network related code will also have to be ported to the v3 API.

The second reason was that the new tasks API work was not completed in time. As discussed at the mid cycle meetup we did not want to compromise on the design of the tasks API and it requires non backwards compatible changes to several API areas. To reduce the risk of getting this very last minute API changes wrong and having to live with those bugs for a very long time, it was decided to defer marking the v3 API as supported rather than experimental.

v3 API improvements

User facing improvements

Operators and Developers

Long term dual maintenance costs

Measuring the dual maintenance cost and short term solutions

One of the issues from creating the v3 API and the v2 API needing to be supported for a much longer period than the standard 1 cycle deprecation period is the dual maintenance costs. That is, when Nova internal APIs change it may in cases be necessary to make the corresponding changes in the v2 API code, possibly the EC2 API code, and now the v3 API code as well.

The recent objects work is an example where this has occurred. The API layer is in most cases a very thin layer on top of Nova internals and follows the general layout of:

  1. Assembling client supplied data and input validation.
  2. Call nova internal methods
  3. Parse returned data and format it for return to the client or handle exceptions

There are cases where it is more complex than this, but they all follow the general process. So an example of one recent objects patch which required changing both the v2 and v3 API code is converting unrescue to support objects. The diffstat for the patch is:


diff ec78b42d7b7e9da99ba063cccc8a4f6d0aa7c8e5^..ec78b42d7b7e9da99ba063cccc8a4f6d0aa7c8e5 | diffstat
 api/openstack/compute/contrib/rescue.py    |    2 +-
 api/openstack/compute/plugins/v3/rescue.py |    3 ++-
 compute/manager.py                         |   14 +++++++-------
 compute/rpcapi.py                          |   12 ++++++++----
 tests/compute/test_compute.py              |    8 +++++---
 tests/compute/test_rpcapi.py               |    2 +-
 6 files changed, 24 insertions(+), 17 deletions(-)

So changes were required to both the v2 and v3 versions of rescue plugin. The patches to those parts look like:


diff --git a/nova/api/openstack/compute/contrib/rescue.py b/nova/api/openstack/compute/con
index fe31f2c..0233be2 100644
--- a/nova/api/openstack/compute/contrib/rescue.py
+++ b/nova/api/openstack/compute/contrib/rescue.py
@@ -75,7 +75,7 @@ class RescueController(wsgi.Controller):
         """Unrescue an instance."""
         context = req.environ["nova.context"]
         authorize(context)
-        instance = self._get_instance(context, id)
+        instance = self._get_instance(context, id, want_objects=True)
         try:
             self.compute_api.unrescue(context, instance)
         except exception.InstanceInvalidState as state_error:
diff --git a/nova/api/openstack/compute/plugins/v3/rescue.py b/nova/api/openstack/compute/
index 5ae876b..66b4c17 100644
--- a/nova/api/openstack/compute/plugins/v3/rescue.py
+++ b/nova/api/openstack/compute/plugins/v3/rescue.py
@@ -77,7 +77,8 @@ class RescueController(wsgi.Controller):
         """Unrescue an instance."""
         context = req.environ["nova.context"]
         authorize(context)
-        instance = common.get_instance(self.compute_api, context, id)
+        instance = common.get_instance(self.compute_api, context, id,
+                                       want_objects=True)
         try:
             self.compute_api.unrescue(context, instance)
         except exception.InstanceInvalidState as state_error:

The extra dual maintenance burden ends up being an additional one line trivial change amongst a larger patch to change the infrastructure underneath. Though sometimes there are corresponding changes to tests as well. A lot of the changes for objects have been similar but some are more complicated. In those cases we can remove the dual maintenance burden by refactoring to have the v2 and v3 API code call into a common method. This removes the dual maintenance burden for Nova internal API changes.

Long term option 1 (v2.1 based on the v3 codebase)

In order to reduce the dual maintenance costs in the long term and reduce LOC in Nova, one approach would be to implement the v2 API on top of the v3 codebase primarily as decorators. This is much easier to achieve we allow this implementation to have strong input validation as lot of the translation can be done using just JSON schema. See a simple proof of concept patch here https://review.openstack.org/#/c/77105/. Essentially we'd have a v2.1 which is the same as v2 except for strong input validation.

This technique allows us to eventually have one code base for v2 and v3 and at the same time preserve backwards compatibility for v2 as we can translate on both the input and output. It has the significant maintenance advantage of keeping the handling of v2 and v3 API input and output handling separate from each other. Validation testing for this API is fairly straightforward as we have existing tempest and unittests tests for the v2 API and we are only concerned about verifying that correct input behaviour remains the same.

It also provides for a good transition strategy from v2 to v2.1. As the original v2 API code remains untouched we do not risk accidentally changing the semantics of the v2 API code. The only applications which could break in the transition from v2 to v2.1 would be those which are currently misusing the API. The client applications would be able to have a reasonable time period where they can quite easily test/verify against the v2.1 API before v2 is deprecated. And there is quite a strong incentive for doing so because misusing the API is a sign that they may have a bug in their program.

Long term option 2 (Proxy)

This would involve implementing a separate service which essentially translated v2 REST API requests into v3 ones, ensuring that only valid requests are passed to the v2 API and did the inverse when returning data to the caller. It would also have to implement proxying to neutron/cinder/glance where appropriate. Once this is proven as stable, like option 1, the legacy v2 API code could be removed. This would be one way to retain a v2 API with poor input validation but reduce maintenance costs.

Handling API changes in the future

We need to be a lot more careful about API changes in the future and not just consider a change in isolation, but its impact overall to the Nova API. However, I think we have to accept that we will still make mistakes and need a strategy of how we handle that.

Whether we use version headers or url path differences, either are major version revs and I think one of the most important priorities should be to keep our code base clean in the long term and not end up with a growing number of interleaved version tests in the API code. Wherever possible I think we should aim to keep the canonical latest version of the API code clean and separate from code needed to keep legacy API versions supported. As this lowers our long term maintenance costs.

Last edited 2014/03/04 05:52:43 UTC