[Cbe-oss-dev] spufs: kernel hangs by polling 'mfc' w/o proxy DMA request

Arnd Bergmann arnd at arndb.de
Wed Sep 12 23:00:44 EST 2007


On Friday 07 September 2007, Kazunori Asayama wrote:
> I found that the kernel hangs if programs poll 'mfc' node of SPUFS
> without proxy DMA requests. The reason why this problem occurs is
> that:
> 
>   - If spufs_mfc_poll, which is the 'poll' operator of 'mfc', is
>     called without proxy DMA requests, spufs_mfc_poll issues a proxy
>     tag group query with query mask = 0 and query type = 2 (all):
> 
> 	ctx->ops->set_mfc_query(ctx, ctx->tagwait, 2);
> 
>     The processor immediately raises a 'tag-group completion
>     interrupt' corresponding to this query, because there is no
>     outstanding proxy DMA at all.
> 
>   - The spufs_mfc_poll never (regardless of other conditions) returns
>     POLLIN event when tagwait (a set of issued proxy DMA) is zero:
> 
> 	if (tagstatus & ctx->tagwait)
> 		mask |= POLLIN | POLLRDNORM;
> 
>   - As a result of the above, spufs_mfc_poll endlessly issues proxy
>     tag group queries with query mask = 0 and query type = 2, if once
>     spufs_mfc_poll is called without proxy DMA request.

Oh, you mean it gets into a busy-loop? That should really not
happen.

I suppose we should immediately return from the loop when a program
accidentally calls poll() without having entered any requests into
the queue first, like

	if (!ctx->tagwait)
		mask |= POLLERR;

and have the read function on the file return -EINVAL.

> My questions are:
> 
>   - Why does spufs_mfc_poll issue queries with query type = 2 ? It
>     seems strange that this condition (all) is different from
>     spufs_mfc_read's one (any).
> 
>     I guess that this is just a workaround to implement the
>     'SPE_TAG_ALL' behavior of spe_mfcio_tag_status_read without using
>     incomplete fsync implementation.

Don't remember why it was done, but it seems strange now. User space
could still call poll()/read() repeatedly until all DMAs are
done to implement SPE_TAG_ALL from user space, even if we use
query type 1.
 
>     I remember that when we discussed how the behavior flags of
>     spe_mfcio_tag_status_read should be implemented, we reached the
>     conclusion as:
> 
>     * behaviors of operations on 'mfc' node:
> 
>       - blocking read on 'mfc'
> 
>       	blocks until at least one of the DMAs completes and then
>       	returns all currently complete tag groups.
> 
>       - non-blocking read on 'mfc'
> 
>       	reads the current Prxy_TagStatus and masks it with tagwait.
> 
>       - fsync on 'mfc'
> 
>       	blocks until all DMAs (tagwait) are complete.

actually, this was disabled at some point, and never put back:

#if 0
/* this currently hangs */
­·······ret = spufs_wait(ctx->mfc_wq,
­·······­·······­······· ctx->ops->set_mfc_query(ctx, ctx->tagwait, 2));
­·······if (ret)
­·······­·······goto out;
­·······ret = spufs_wait(ctx->mfc_wq,
­·······­·······­······· ctx->ops->read_mfc_tagstatus(ctx) == ctx->tagwait);
out:
#else
­·······ret = 0;
#endif

I have no idea why it would hang though, 


>       - poll on 'mfc'
> 
>       	returns whenever any one of the DMAs completes.
> 
>     * implementations of libspe
> 
>       - spe_mfcio_tag_status_read with SPE_TAG_ALL: fsync then read.
> 
>       - spe_mfcio_tag_status_read with SPE_TAG_ANY: blocking read.
> 
>       - spe_mfcio_tag_status_read with SPE_TAG_IMMEDIATE: non-blocking read.
> 
>       - events: poll

yes, sounds right.

>   - If this my understanding is correct, is it OK that we will fix the
>     fsync implementation then we will change the poll behavior so that
>     it waits for 'any' condition?
>     
>     I think that there is no problem with doing so, because the
>     libspe2's spe_mfcio_tag_status_read already has code to call fsync
>     on 'mfc' for SPE_TAG_ALL and the libspe2 will still work correctly
>     after this change.

yes.

>   - How should the SPUFS behave with tagwait = 0?
> 
>     I think that followings are reasonable and useful for application
>     programs:
> 
>     - poll: waits until new proxy DMAs will be issued and any of them
>       	    will complete.

As mentioned above, I think returning POLLERR would be more appropriate.

>     - fsync: returns immediately.

yes, that would be good

	Arnd <><



More information about the cbe-oss-dev mailing list