Many application areas ranging from serious games for health to learning by demonstration in robotics, could benefit from large body movement datasets extracted from textual instructions accompanied by images. The interpretation of instructions for the automatic generation of the corresponding motions (e.g. exercises) and the validation of these movements are difficult tasks. In this article we describe a first step towards achieving automated extraction. We have recorded five different exercises in random order with the help of seven amateur performers using a Kinect. During the recording, we found that the same exercise was interpreted differently by each human performer even though they were given identical textual instructions. We performed a quality assessment study based on that data using a crowdsourcing approach and tested the inter-rater agreement for different types of visualizations, where the RGB-based visualization showed the best agreement among the annotators.