Tesis – Conclusions and Future Work

8.7 Conclusions and Future Work

As the importance of sign languages is recognized throughout the world, the inclusion of the signing community in the digital economy is fundamental. To this end, computational representations of sign languages are necessary, both for end users and researchers alike. In the case of SignWriting, much of the available data exist in image formats, understandable by humans but not machines. Converting SignWriting images into a computational representation is a necessary first step to automatically process them, but requires state-of-the-art applications of artificial intelligence, since images are not easy to process using hand-crafted rules or ad-hoc procedures.

We have presented a careful analysis of the data underlying the problem, establishing a categorization of the different meanings of SignWriting symbols into hierarchical features. This formalization is itself one of our contributions, a computationally valid representation which captures the intended meaning of SignWriting transcriptions into a numerical representation of the positions of graphemes within a logogram and additional key-value pairs of features for each of them.

To automatically reproduce this representation for new instances of SignWriting, the best approach is to use deep learning, able to capture complex relationships in the data and generalize from the patterns present in a training corpus to the general case. However, large amounts of data are necessary to make deep learning approaches work reliably. For our problem, there does not exist a reference corpus of data, or a similar problem from which to transfer learned neural network weights.

We have collected the necessary samples, and created a corpus (Chapter 6, or Sevilla, Lahoz-Bengoechea, y Díaz (2022)) with which to train the algorithms. Still, the amount of data available is small, and costly to annotate. However, we have shown that the annotation pays off if done carefully. Our use of many features, decided both from a semantic point of view and the necessities of the visual processing required for the problem, has allowed us to build an expert solution able to automatically recognize SignWriting. Compared to the simple, direct approach using a single YOLO network, our proposed improved system uses many deep learning networks, combined in an intelligent pipeline which can extract additional information and make decisions based on previous steps of processing, achieving a 17% improvement in recognition accuracy and additionally being able to extract partial information even when recognition fails.

On the whole, domain knowledge about the problem has let us create a system which utilizes deep learning approaches even in a situation where no existing data can be found, by collecting the corpus ourselves, defining a formal schema for its annotation, and exploiting it to get the best performance from the neural networks employed. Our ideas and approach may be useful not only to process SignWriting instances, but may be applicable to other problems where the data available are less numerous than the expert knowledge that can be collected.

8.7.1 Future work

There are three straight-forward directions in which this research can be improved. On one hand, collecting more and more varied data will likely improve performance of the system, and solve some of the limitations outlined above. A second direction to go is downstream, putting the recognized SignWriting representation to use in more consumer applications. The needs of these applications will tell us what the strong points of our approach are, and where its need to improve to support their use case. Finally, the components themselves used in the system may be improved. We have used readily available neural network architectures, as can be found in the literature, and with implementations that we can directly use. Fine-tuning the network parameters, or swapping some of them for networks better suited to each particular sub-task, will surely improve overall performance.

There is also room for alternative approaches to be tried. An ensemble of neural networks, where their results are weighed and combined, can help improve detection and classification of rarer graphemes, or correct frequent errors for certain common or uncommon situations. A custom neural architecture that embeds all the steps in our pipeline may be possible, which would facilitate feedback between steps, or some other technique for improving the results of earlier steps in the pipeline by taking into account the confidence of later steps.

Acknowledgments

The development of the expert system described in this paper is part of the project “Visualizando la SignoEscritura” (Visualizing SignWriting, https://www.ucm.es/visse), reference number PR2014_19/01, developed in the Faculty of Computer Science of Universidad Complutense de Madrid, and funded by Indra and Fundación Universia in the IV call for funding aid for research projects with application to the development of accessible technologies. We want to acknowledge the collaboration of the signing community, especially the Spanish Sign Language teachers at Idiomas Complutense and Fundación CNSE.