Introduction to the Xen Share Code Goal ---- To produce a simple interdomain transport which has device-like characteristics. Method ------ We need mapped I/O, interrupts and DMA. For mapped I/O, dom0 explicitly creates "sharable" pages for the devices to use, using the dom0 ops: DOM0_CREATESHAREDPAGES(num-pages) - Returns a share ref (on x86, an mfn) for machine contiguous pages as a sharable region. DOM0_DESTROYSHAREDPAGES(share_ref) - Free the shared pages once no domain is referencing them (see below) DOM0_GRANTSHAREDPAGES(domain, share_ref) - Allow this domain access to these shared pages. These are mapped by the domain using the multiplexed "share_op" hypercall: XEN_SHARE_get(share_ref,evtchn) - Returns a non-negative lowest-possible "peer id" (a domain can get the same share multiple times, so the peerid differentiates them). The evtchn is used to notify of all events which occur on this region. The pages can then be mapped. XEN_SHARE_drop(share_ref,peerid) - Drops the shared pages. We have a method of receiving and sending notifications, based on addresses within the shared region: XEN_SHARE_watch(share_ref, peerid, triggernum, u32 *decaddr) - When someone triggers this trigger number (see below), atomically decrement decaddr: if it hits 0, raise the eventchannel. XEN_SHARE_trigger(share_ref, triggernum) - Trigger any watches on this trigger number (see above). Finally, there is a method for registering scatter-gather lists for input or output: XEN_SHARE_sg_register(share_ref, peerid, read/write, #sgs, sg[], u32 *lenaddr) - Register an array of machine address/length pairs associated with this share & peerid. When it's used (see below), the length will be written at lenaddr, the sg unregistered, and the event channel raised. XEN_SHARE_sg_unregister(share_ref, peerid, sg_addr) - Unregister the scatter-gather list with this first address. XEN_SHARE_sg_xfer(share_ref, peerid, read/write, peerid, #sgs, sg[]) - Copy from these virtual address/length pairs to this peer associated with this share & peerid. Returns the number of bytes actually transferred. These mechanisms create an efficient and simple way of writing virtual drivers which behave like normal device drivers. Special Triggers ---------------- Currently trigger 0 is fired whenever someone adds an SG and previously had none. This is a useful indicator to domains that sg operations can be retried. In future this may also trigger on other meta-ops. Examples -------- There is a simple test in drivers/xen/share_test.c which tests the various share features: boot domU with share_test.addr=
. There is also an implementation of a network driver and a block driver. Future optimizations / improvements ----------------------------------- (1) The hypervisor code uses dumb linked lists, which could be arrays and hashes. (2) The hypervisor currently always copies, but could page-flip. This needs involve guest awareness in the non-shadow-pagetable model (the hypervisor *could* reach in and update the pfn array). (3) The hypervisor should probably use the smallest possible sg entry. (4) The entire protocol can be transparently remoted by the hypervisor. (5) A trusted partition could ask the hypervisor for the sg list details for a domain, which it could then program directly into a device. (6) The code isn't PAE friendly, and actually hands addresses in some places. These should be fixed. (7) The decaddr trick is a cute idea for interrupt mitigation, but currently unused and probably unnecessary. It could be turned into a simple "set this to 1". (8) Perhaps the hypervisor should wake at offset 0 whenever someone drops the share.