This Open Science practice is a summary of open results in the form of available datasets, software, and models developed by the LangNet research group since 2012. The majority of InfoCoVand LangNet results and publications are freely available to the research community. Recently, the focus has been on the availability and visualization of data and results from the InfoCoV project.
InfoCoV (Multilayer Framework for the Information Spreading Characterization in Social Media during the COVID -19 Crisis) is a scientific project funded by the Croatian Science Foundation and will run until February 2022. The aim of the project is to monitor the online media and social networking landscape in Croatia during the pandemic. The InfoCoV project collects textual data on COVID -19 published in online news media, as well as comments and posts in social networks during pandemics in Croatia since 2020. In short, the project aims to monitor infodemics during pandemics in Croatia using artificial intelligence methods: Natural Language Processing, Social Network Analysis, Machine and Deep Learning. The project has resulted in several categories of datasets and language resources that are (or will be) made available to the research community.
This represents the broader (LangNet group) and narrower (InfoCoV project) context in which we have implemented our open science initiatives through both resource opening and open access publishing.
What has the initiative achieved so far?
Currently, the LangNet group has several open resources in the form of datasets, software, and trained models. In terms of tools, applicants have developed software for language network construction (LanCoA) and multilingual keyword extraction, as well as applications for retrieving and visualising COVID -19 data. Next, they opened the dataset and algorithm for automatic syllabification of Croatian language used by the company OmoLab for the development of a free mobile application to support dyslexic students. This collaboration led to the awarding of the 1st Technology Transfer Prize by the Foundation of the University of Rijeka in 2019. This collaboration with OmoReader directly contributed to the inclusion of the vulnerable social group in school and society.
How FAIR (Findable, Accessible, Interoperable and Reproducible) is the practice?
The availability of NLP resources and datasets is a prerequisite for the reproducibility of all scientific results we have published in 15 open access journal papers. Next, opening the COVID -19 dataset can help in understanding the communication patterns during the pandemic. It can also help in detecting misinformation, fake news and conspiracy theory news. The dataset could be used to understand and prevent infodemics, which can be very dangerous, especially in times of crisis. In addition, scientifically sound characterization of information during a crisis is crucial. By understanding the patterns of communication, it is possible to raise awareness of the problems and the importance of scientific research. Finally, understanding communication related to restriction policy can provide information about public opinion and reactions to different types of restrictions, how perceptions of restrictions change over time, etc. The open resources could be valuable for the post-Corona era, providing insights for better managing potential future crises. With this in mind, applicants hope to move beyond the mere replicability of research and create the necessary information for the systematic study of phenomena and possible interventions for the benefit of citizens and society in general. Finally, with the OmoLab collaboration on the dyslexia application, applicants already have confirmation that opening up scientific results can benefit vulnerable categories in society, and hope that such an initiative can take place in the future. The opening up of resources and scientific results can serve as an example of good practice to other newly established laboratories in the Faculty of Informatics and Digital Technology, as well as to the AIRI Center laboratories.
About the applicants
This proposal is part of the research initiatives and projects of the Language Networks group, founded and led since 2012 by Prof. Dr. Sanda Martinčić-Ipšić. This proposal also builds on the results of the InfoCoV research project led by Prof. Dr. Ana Meštrović. Furthermore, in January 2022, the LangNet group will change its name to the Semantic Technologies Research Group and the corresponding laboratory. The reason for the renaming is that the Department of Informatics is growing into the Faculty of Informatics and Digital Technology, both at the University of Rijeka. Therefore, with this proposal, we would like to unify all of our open resources into a single access point on the new Semantic Technologies Group website, which will also include the organization of all of our open datasets and resources. The group currently consists of three senior and four junior researchers and is part of the Laboratory for Natural Speech and Language Processing at the Center for Artificial Intelligence and Cybersecurity (AIRI) at the University of Rijeka. Since 2015, the LangNet group has published 15 open access journal publications (5 in Q1 and 5 in Q2 ranked WOS) and over 15 open conference papers. Both Sanda Martinčić-Ipšić and Ana Meštrović have more than 15 years of research experience in natural language processing, social network analysis, machine and deep learning, knowledge representation and semantic technologies.