POSTERS

Participants at the poster session will present their research projects, current developments and ideas, and will receive the feedback from renowned scientists at leading USA institutions and other WS participants.

Poster template

Presented posters are based on published or ongoing studies, and are divided into two categories:

  1. Student competition (the best paper will be awarded) – the first author and presenter is required to be a student (including undergraduates and high school students)
  2. General category (non-competitive) – open to all participants, including early career and senior investigators from industry, government labs and academia

 

The session includes posters of the following topics:

Track A. Data Science Foundations

Track B. Data Science in Critical Infrastructures

Track C. Biomedical Informatics

Track D. Digital Archeology

Track A - Data Science Foundations

Learning Embeddings on Enhanced Call Graphs for Churn Prediction
Sandra Mitrovic, Jochen De Weerdt
Network structured data are inevitable in various real-life situations and domains. With the uptake of social network analytics, there is a predominant strategy to transform the original (implicit networked) data to (explicit) networked data and operate on these, which has proven beneficial for many predictive tasks. Such an example is transforming Call Detail Records (CDRs) to call graphs for solving various telco-related data mining problems, including churn prediction.
However, the way of featurizing, that is, deriving informative features from these networks, remains highly versatile and non-systematic. The reason for this is two-fold: (1) the complexity of call networks which reflects both in structural connections based on graph topology (to be referred as structural features) and customer interactions/calls (to be referred as interaction features) and (2) the absence of an encompassing methodology for tackling the feature extraction. Current works typically featurize call networks by using ad-hoc hand-crafted features based on the available data, using different versions of RFM (Recency-Frequency-Monetary) features to capture interactions and mostly only node degree as structural information (other centrality measures are computationally expensive on large networks).
In this work, we aim at integrating both structural and interaction information, while circumventing the process of extensive feature hand-engineering. We achieve this by proposing a novel approach based on node representation learning in RFM-enriched call networks. To this end, we first devise different operationalizations of RFM variables and based on these, we design novel network architectures to exploit the full potential of network data. For representation learning, we perform necessary adaptations of existing node2vec method, to make it scalable for our RFM-enriched networks.
Obtained results on one postpaid and one prepaid dataset demonstrate that our method outperforms the classical approach based on RFM features, both in terms of AUC and lift.
Submission of the poster was confirmed by an email, but confirmation about the status of the poster registration by the posters co-chairs in the form of an email is expected.
Gaussian conditional random fields extended for directed graphs
Tijana Vujičić
For many real-world applications, structured regression is commonly used for predicting output variables that have some internal structure. Gaussian conditional random fields (GCRF) are a widely used type of structured regression model that incorporates the outputs of unstructured predictors and the correlation between objects in order to achieve higher accuracy. However, applications of this model are limited to objects that are symmetrically correlated, while interaction between objects is asymmetric in many cases. We proposed a new model, called Directed Gaussian conditional random fields (DirGCRF), which extends GCRF to allow modeling asymmetric relationships (e.g. friendship, influence, love, solidarity, etc.). The DirGCRF models the response variable as a function of both the outputs of unstructured predictors and the asymmetric structure. The effectiveness of the proposed model is characterized on six types of synthetic datasets and four real-world applications where DirGCRF was consistently more accurate than the standard GCRF model and baseline unstructured models.
Generalization-Aware Structured Regression towards Balancing Bias and Variance
Martin Pavlovski, Fang Zhou, Nino Arsov, Ljupco Kocarev, Zoran Obradovic
Attaining the proper balance between underfitting and overfitting is one of the central challenges in machine learning. It has been approached mostly by deriving bounds on generalization risks of learning algorithms. Such bounds are, however, rarely controllable. In this study, a novel bias-variance balancing objective function is introduced in order to improve generalization performance. By utilizing distance correlation, this objective function is able to indirectly control a stability-based upper bound on a model's expected true risk. In addition, the Generalization-Aware Collaborative Ensemble Regressor (GLACER) is
developed, a model that bags a crowd of structured regression models, while allowing them to collaborate in a fashion that minimizes the proposed objective function. The experimental results on both synthetic and real-world data indicate that such an objective enhances the overall model's predictive performance. For the tasks of predicting housing prices and hospital readmissions, GLACER outperforms a broad range of traditional and structured regression models by significant margins, while sustaining stable predictions.
Fast Learning of Scale-Free Networks Based on Cholesky Factorization
Vladisav Jelisavcic, Ivan Stojkovic, Veljko Milutinovic, Zoran Obradovic
In this talk we will present our recent results on recovering network connectivity structure from high-dimensional observations. A common approach to finding connectivity pattern from high-dimensional data is to learn a Sparse Gaussian Markov Random Field. By optimizing the regularized maximum likelihood, sparsity is induced by imposing L 1 norm on the entries of a precision matrix. In our recent work published in International Journal of Inteligent Systems, we exploited
favourable properties of L1 penalized Cholesky factors that allowed us to develop a very fast highly parallelizable SNETCH optimization algorithm based on coordinate descent and active set approach. Presented model is particularly suited for problems with structures that allow sparse Cholesky factor, an assumptions that is commonly found in Scale-Free Networks.
Empirical evaluation on synthetically generated examples and high-impact applications from a biomedical domain of up to 900,000 variables, provides evidence that the SNETCH algorithm can be an order of magnitude faster than state-of-the-art approaches based on the L 1 penalized precision matrix.
Results reported in this talk are published at:
Jelisavcic, V., Stojkovic, I., Milutinovic, V., Obradovic, Z. “Learning of Scale-Free Networks based on Cholesky Factorization,” International Journal of Intelligent Systems, 2018; 1-18, https://onlinelibrary.wiley. com/doi/abs/10.1002/int.21984.
Application of the neural netvork in inverse solving of photoacoustic problem
Slobodanka Galovic, Mioljub Nesic, Marica Popovic, Dragan Markushev, Drasko Furundzic
In this paper the neural network has been developed for thermal and mechanical characterization of samples that are optically opaque by photoacoustic. Simple perceptron neural network with advance signal propagation was used to simulataneously estimate thermal diffusivity, thermal expansion coefficient and thickness of investigated sample. Theoretically described amplitudes and phases of the transmission open cell photoacoustic signals are used to train and test the neural network in a wide range of mentioned parameters within the 50 Hz – 20 kHz modulation frequency domain. The advantages and disadvantages of the neural networks application in photoacoustics characterization are analysed. Network reliability, precission and real-time operation is verified on independent set of signals, establishing photoacoustics as a competitive and powerfull technique assigned for material characterization.
Topic models for Domain Independent Self-learning in Serbian Language
Daniel Polimac
Topic modeling methods are generally used for automatically organizing, indexing, searching, and summarizing large digital textual corpora, but have also proven to be a great tool in ontology extraction and generation. Usually, a strong ontology parser is applicable for small-scale corpora, however, an unsupervised model like topic modelling is beneficial for learning new entities and their relations from new data sources and is likely to have a better performance on larger corpora.
Current ontology construction methods still rely heavily on manual parsing and existing knowledge bases, even more so in a morphologically rich languages like Serbian with high degree of inflection and syntactically free word order, which becomes exceptionally challenging task when provided smaller chunks of textual data. Additionally, there is the issue of topics overlapping, thus allowing texts to be related to more than one unique topic. Therefore, it is necessary to create a comprehensive model by combining hierarchical and structural approaches to topic modeling in order to better generalize on the relevance and validation of the retrieved information. In this paper I combine the results from Hierarchical Dirichlet Allocation, a non-parametric topic modelling approach that extends Latent Dirichlet allocation to allow for a dynamically assigned number of topics during the model estimation phase for a given corpus, and Structural Topic modeling based on document-level covariate information. The covariates can improve inference and qualitative interpretability and can affect topical prevalence, topical content or both. To evaluate topic generated ontologies, I’ve used standard accuracy measures such as precision, recall and the F-measure by comparing the extracted ontology with synsets from Serbian Wordnet, and associated ontologies from Wikidata and DBpedia.
Data mining with privacy guarantees: a survey of differential privacy mechanisms and architectures
Jelena Novakovic, Dragana Bajovic, Dusan Jakovetic, Dejan Vukobratovic
Data mining with privacy guarantees: a survey of differential privacy mechanisms and architectures With the rise of Internet of Things where various devices, sensors, software, etc. are collecting all kinds of data about our physical world, including humans, privacy preservation is becoming increasingly challenging. The European Union’s General Data Protection Regulation (GDPR) that recently came into force, aims to give control to the citizens regarding the use of their personal data by businesses and enterprises, by defining requirements for processing this kind of data. Specifically, the GDPR requires that any business process must implement data protection by design and by default. One principled approach to achieve this in the context of data mining is the mechanism of differential privacy. With differential privacy, there is a guarantee that individual records of a dataset cannot be learned even when an arbitrary outside information is provided. This is achieved by introducing randomness, by the data curator, into response to queries, posed by machine learning algorithms to the database. The size of the randomness is controlled by the, so called, epsilon factor, or the privacy cost that trades-off the level of achieved privacy with performance of the machine learning algorithm in question. Depending on a given machine learning algorithm, there are many ways in which one can implement differential privacy. In practice, the latter translates into the question at what level of the algorithm and what kind of queries will be perturbed by noise. This master thesis of student Jelena Novakovic, at the Data Science master programme of Faculty of Sciences, University of Novi Sad, aims at surveying different differential privacy schemes and the resulting system architectures. Specific examples that will be used for illustration of general principles for query perturbation will include decision trees and neural networks.
Еmotion recognition in virtual environment based on classification of physiological features
Boris Milićević, Dalibor Veljković, Predrag Tadić, Milica Janković
The detection of emotions is becoming an increasingly important field for human-computer interaction. Accurate emotion recognition would allow computers to recognize human emotions and therefore react accordingly. These machines could find their use in various fields such as medicine, rehabilitation, marketing, advertising, smartphones, home appliances, cars and many others. In this paper we will report the development of an intelligent system capable to recognize three emotional states that highly determine people's decision making: fear ("flight"), focus ("fight") and neutral ("peaceful") state. We have used Virtual Reality (VR) is an effective medium for induction of specific emotional responses using different types of content such as roller coaster, relaxing environment, shooting range and an interactive horror scene. Many studies confirmed that features of physiological signals can reflect emotional states. Our multimodal data acquisition system includes recording of the following signals: electroencephalography (EEG), electrocardiography (ECG), respiration curve and galvanic skin response (GSR). We will consider the correlation between physiological features and emotional states of subjects and will take these findings step further by implementing a predictor using state of the art classification techniques. The development of our system includes the following steps: feature ranking, transformation and selection, hypothesis testing and predictor model training. ECG, respiration and GSR signals will be used as inputs of our model. True subjects' emotional state, which is the output of our predictor, will be formed by combining their self-evaluation after each experiment stage and EEG findings. Finally, we will train a model which will be able to predict our subject's mood in offline and online operating mode. We will compare the results of Softmax regression method, Support Vector Machine, assemble methods ( such as boosted regression trees) and neural networks.
Drowsiness detection based on cardiac and respiratory rhythm using machine learning
Anita Lupšić, Veljko Mihajlović, Predrag Tadić, Milica Janković
Drowsiness is one of the main causes of traffic accidents, about 20% of all. According to data collected by the American Automobile Association, almost 41% of all drivers admitted that they fell asleep at least ones while driving. From the economical point of view around 30 billion dollars are spent on traffic accidents caused by drivers' drowsiness. Drowsiness detection systems are based on tracking the road or behavior of the driver, monitoring of driver’s face/eyes or physiological signals (electroencephalography, electrooculography, respiration rate, electrocardiography etc.). The aim of this paper is to develop an algorithm for the detection of drowsiness based on the variability of the heart rate and the breathing rate. A group of seven healthy adults has taken part into the experiment in which they were exposed to the monotonic multimedia content. The measurement of electrocardiography and the respiration signal using Smartex Wearable Wellness System (Pisa, Italy) was performed. The video of the subject’s face was recorded as the reference signal for drowsiness. All data were acquired while subjects were awake, sleepy and in the early stage of sleep. Frequency, time and fractal feature extraction of heart rate variability and respiration curve will be performed. The machine learning algorithm for the statistical analysis of extracted features will be developed. Since it is a multinomial classification (three output classes are "awake", "sleepy" and "fallen asleep") the first algorithm for implementation will be a Softmax regression. If needed, more powerful classification models, such as Support Vector Machines, Random Forest or Neural Networks will be implemented. During the evaluation of the results of the classifier, the unequal risks of different types of error will be considered, because the consequences are significantly higher when the algorithm fails to detect that the driver has fallen asleep, than when the awakening driver declares it is drowsy.
Analysis of enriched scientific collaboration networks
Miloš Savić, Mirjana Ivanović
Research collaboration is one of the key social features of contemporary science. It can be quantitatively studied by analyzing scientific collaboration networks, i.e. networks in which two researchers are directly connected if they co-authored at least one bibliographic unit together. In this poster we present a methodology to analyze scientific collaboration networks whose nodes are annotated with attributes providing additional information about researchers and quantifying various determinants of research performance. The accompanying case study in the domain of intra-institutional research collaboration demonstrates that the proposed methodology enables an in-depth analysis of research collaboration and its
relationships with other indicators of research performance.
Facial emotion recognition using conventional and deep-learning approaches
A. Kartali, M. Djurić-Jovičić, M.M. Janković
Emotion recognition has application in various fields such as medicine (rehabilitation, therapy, counseling, etc.), e-learning, entertainment, emotion monitoring, marketing, law. Different algorithms for emotion recognition include feature extraction and classification based on different types of signals: brain activity, heart activity, speech, galvanic skin response, facial expressions, body movement. In this paper we present results of automated emotion recognition of four basic emotions
(happiness, sadness, anger and fear) based on images of facial expressions.
We have performed two types of approaches: 1) AlexNet convolutional neural network (CNN) as a deep-learning approach and 2) two conventional approaches for classification of Histogram of Oriented Gradients (HOG) features: Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). Open source toolbox OpenFace was used for facial landmark detection for all approaches. Masking has been performed to remove non-facial information. Two publicly available facial expression databases of adults have been used for training and testing of algorithms: Extended Cohn-Kanade Dataset (CK+)
and Karolinska Directed Emotional Faces (KDEF). To prevent over-fitting, fine-tuning of pretrained AlexNet was performed as well as the data augmentation of CK+ and KDEF databases (extracting of five random regions from images and generating their horizontal reflections). Implemented algorithm based on AlexNet CNN recognized four basic emotions with the
accuracy of 94.8%. SVM classification resulted with accuracy of 98.9%, while MLP classification had the accuracy of 99.5%. Additionally, we have performed real-time testing of all applied approaches on 8 subjects (3 male, 5 female, age 21.87 ± 1.46). Real-time testing has provided us with the following result: AlexNet 87%, SVM 71.5% and MLP 66%, which can lead to
the preliminary conclusion that deep-learning approach results with higher accuracy in real-time use.
Fooling a Neural Network with Adversarial Noise
Nikola Popović, Marko Mihajlović
These days deep Neural Networks (NN) show exceptional performance on speech and visual recognition tasks. These systems are still considered a black box without a deep understanding why they perform in such a manner. This lack of understanding makes NNs vulnerable to specially crafted adversarial examples – inputs with small perturbations that make the model misclassify. In this paper, we generated adversarial examples that will fool a NN used for classifying handwritten digits. We showed how to generate additive adversarial noise for each image and we proposed an algorithm for crafting a single adversarial noise for misclassifying different members of the same class. This poster emphasizes the safety concerns of AI algorithms, which are used extensively in Data Science.
A validation of prediction models for the estimation of breast cancer disease recurrence through the explanation of individual predictions
Bojana Andjelkovic Cirkovic, Nenad Filipovic
As the most common malignancies among women all over the world, the breast cancer represents the global problem. Although modern medicine and molecular biology have changed the way the breast cancer is perceived in the past years, the disease still causes high morbidity and mortality. For this reason, any development that may lead to successful prognosis of the disease and its severity is of great importance. Machine learning techniques could contribute substantially in exploring the vast number of prognostic factors and making accurate decisions in patient-specific manner. However, in most of cases, the inability to provide informative decisions represents the main discomfort with these predictive models which limits their usability. Here we represent a machine learning approach to estimate the disease recurrence within the 5-year period after surgery since it is one of the most significant outputs of the disease outcome and success of treatment. Various machine learning classification algorithms were tested on the database of real patients described with 58 prognostic factors recorded after breast cancer surgery and through patients follow-up. Predictive variables were initially preprocessed by feature selection approaches to select only the most informative features as inputs to machine learning methods. The models were tested by using the cross validation procedure. Additionally, the explanation module was developed to
provide the user with the decision-making process of a classifier for particular instance in the form of features contributions, interpretable by humans. Besides the perception about the model rationality, these explanations are of significant importance for new and unseen examples, which could enable the users to decide whether to trust the predictions or not.
Gaussian CRF for classification
Andrija Petrovic, Mladen Nikolic, Milos Jovanovic, Boris Delibasic
Gaussian conditional random fields (GCRF) is a widely used structured model for continuous outputs that uses multiple unstructured predictors to form its features and in the same time exploits structure among outputs. In this paper, Gaussian conditional random fields for structured binary classification (GCRFC) is developed. The model representation of GCRFC is extended with latent variables so many appealing properties are brought. Additionally, two different forms of the algorithm are presented: Bayesian and Non-Bayesian. The extended method of local variational approximation are used for solving empirical Bayes in Bayesian GCRFC learning, whereas the inference is solved by Gaussian quadrature rule. Both models are compared on synthetic data. Advantages and disadvantages of Bayesian GCRFC In comparison with the Non-Bayesian is discussed in detail.
Formal Systems for Probabilistic Reasoning
Aleksandar Perović, Zoran Ognjanović, Miodrag Rašković
The problems of representing, and working with, uncertain knowledge are ancient problems explored by a number of scholars – Leibnitz, Jacob Bernoulli, de Moivre, Bayes, Lambert, Laplace, Bolzano, De Morgan, Boole, etc. Researchers, in the fields connected with applications to computer science and artificial intelligence, have studied uncertain reasoning using different tools: Bayesian network, non-monotonic logic, Dempster-Shafer Theory, possibilistic logic, rule-based expert systems with certainty factors, argumentation systems, etc. Our aim is to introduce probability logic based formalization of uncertain reasoning.
Various algorithms for data analysis produce probability distributions based on given samples. Probability logic provides a background for reasoning about probabilities of consequences of one’s knowledge about data. An example of provable a statement is: “if the probability of A is s, and the probability that B is a consequence of A is t, then the probability of B is between t-(1-s) and t”. This reasoning assumes the framework of classical logic. We have also developed probabilistic extensions of some nonclassical logics – temporal, intuitionsistic, epistemic, default, etc., and formalized reasoning about generalized probabilities (i.e., probability functions with ranges that are partially ordered, complex valued, etc.). We also address decidability and complexity of probability logics. We have developed heuristically based methods for the probability logic satisfiability problem that solves the biggest instances reported in the literature.

Selected references
Z.Ognjanović, M.Rašković, Z.Marković, Probability Logics, Springer, 2016. Z. Ognjanović, Z.Marković, M.Rašković, D.Doder, A.Perović, A Probabilistic Temporal Logic That Can Model Reasoning about Evidence, Annals of Mathematics and Artificial Intelligence, Vol. 65, 217-243, 2012.
A.Perović, Z.Ognjanović, M.Rašković, D.Radojević, Finitely additive probability measures on classical propositional formulas definable by Godel's t-norm and product t-norm, Fuzzy Sets and Systems 169, 65-90, 2011.
Z.Marković, M.Rašković, Z.Ognjanović, A Logic with Approximate Conditional Probabilities that can Model Default Reasoning, International Journal of Approximate Reasoning Volume 49, 52-66, 2008.
Resampling of discrete random variables
Filip Markovic, Tijana Vujicic
In our work we investigate a novel re-sampling algorithm for the convolution operation between the probabilistic random discrete variables. This mathematical operation is often used in statistics and data analysis.
One of its properties is that by convolving a series of random variables can often end up into exponential growth of the computation time by each new convolution. This is the direct consequence from the fact that the resulting variable from the convolution of variables K and M may have k*m values, where k and m are number of values in K and M respectively.
In order to avoid the exponential growth of the computation time re-sampling is often used method. This method estimates a new distribution with a fewer number of values, hence with some loss of information compared to the original distribution. The main goal of re-sampling is to estimate a distribution that is as close as possible to the original distribution according to the selected criteria. Mainly, the selected criteria is pessimism, or an expectation of a distribution.
The existing re-sampling algorithms maintain high level of the computational complexity, e.g. Reduced-Pessimism Re-sampling, or have a low level of complexity but provide very pessimistic distributions, e.g., Uniform Spacing Re-sampling. We propose a probability-uniform resampling algorithm, that achieves much better reallocation of probability mass compared to the Uniform Spacing algorithm, and has considerably better computation complexity compared to
Reduced-Pessimism Re-sampling, since the computation complexity of the proposed algorithm is O(n). The proposed algorithm enables fast computation with a precise distribution re-sampling.
A stochastic spectral-like gradient method with an application to token-based randomized optimization over networks
Katarina Vla Panić, Dušan Jakovetić, Nataša Krejić, Nataša Krklec Jerinkić
This paper reports on the work carried out in the context of doctoral studies of Katarina Vla Panic at the Dept. of Mathematics and Informatics, Faculty of Sciences, Univ. of Novi Sad, Serbia. Specifically, the work considers the problem of minimizing a possibly large but finite sum of N convex cost functions. This problem appears frequently in many applications such as minimizing empirical losses in machine learning applications. In this setting, we develop and analyze a stochastic spectral-like gradient method, allowing that the gradient estimates utilized at each iteration can be biased. Under appropriate conditions on the size of gradient bias, we establish a linear convergence rate of the method, in terms of the expected global cost's optimality gap. We then apply the developed spectral-like method and analysis to token-based distributed randomized optimization over networks.
Hidden semi-Markov models: applications, methodological advances, and optimal detection
Dragana Bajovic, Kanghang He, Lina Stankovic, Dejan Vukobratovic, Vladimir Stankovic
Hidden semi-Markov models (HSMMs) are a class of signal models where the underlying sequence of states, hidden in noise, switches over time in a Markov fashion, with the time spent in each state modeled by a probability mass function on the set of all possible state durations. HSMMs therefore generalize the classical hidden Markov models in that the time spent in each state is not necessarily geometrically distributed, but can rather assume an arbitrary distribution. Since their inception in the 80s for speech analysis, HSMMs had successful applications in numerous different areas including: econometrics, for modeling employment or marital status of an individual, biometrics, for DNA sequencing and identifying forest tree growth components, for electric load disaggregation, etc. In this work we study detection of HSMMs. Assuming Neyman-Pearson setting, we derive optimal detection test and analytically characterize its performance by type-II error exponent. The straightforward implementation of the optimal, likelihood ratio test is impossible in any big-data application scenario, due to a high number of possible state sequences. We show that the likelihood in fact evolves as a simple, linear recursion,
involving a diagonal measurement modulated matrix and a certain HSMM transition matrix. Capitalizing on this matrix product form, we show that the type-II error exponent is given by the top Lyapunov exponent for the derived product. Finally, we provide both an upper and a lower bound for the error exponent value. The upper bound is the genie-aided error exponent, with information about the exact locations for the state transitions, and the lower bound, analytically more challenging, is derived using theory of large deviations and statistical physics. Extensive numerical results illustrate the tightness of both bounds and provide further interesting dependences of the error exponent on the HSMM parameters.
Collective social phenomena: physics perspective
Aleksandra Alorić, Ana Vranić, Marija Mitrović Dankulov, Jelena Smiljanić
Development and use of information communication technologies have enabled access to various types of data about human behavior. The availability of data is the main driving force behind the expansion of a new interdisciplinary field commonly known as computational social science [1]. The main goal of this new scientific discipline is to provide us with a quantitative description and understanding of complex social behavior. Researchers from different areas of science, including physics, computer science, mathematics, economics, and sociology, are making the best use of data and various computational techniques to achieve this goal. Statistical physics, in combinations with data analysis, theoretical modeling, and complex networks theory has proven to be very effective in uncovering and quantifying the mechanisms that underlie the collective behavior in social systems [1]. In this short overview of recent results we will demonstrate how techniques and methods of complex networks and statistical physics can provide a better understanding of the emergence of collective emotions in cyberspace [2] and cooperative long-term market loyalty in double auction markets [3], collective knowledge building [4] and dynamics of event-driven social groups [5].
[1] P. Sen, B.K. Chakrabarti, Sociophysics: An Introduction, Oxford University Press (2014).
[2] M. Mitrović, G. Paltoglou and B. Tadić, Quantitative analysis of bloggers' collective behavior powered by emotions, Journal of Statistical Mechanics: Theory and Experiment 2011, P02005 (2011)
[3] A. Alorić, P. Sollich, P. McBurney, and T. Galla: Emergence of Cooperative Long-Term Market Loyalty in Double Auction Markets, PLoS ONE 11, e0154606 (2016).
[4] M. Mitrović Dankulov, R. Melnik and B. Tadić, The dynamics of meaningful social interactions and the emergence of collective knowledge, Scientific Reports 5, 12197 (2015).
[5] J. Smiljanić and M. Mitrović Dankulov, Associative nature of event participation dynamics: A network theory approach, PLoS ONE 12, e0171565 (2017)
Visual anomaly detection using deep convolutional neural networks
Slavica Todorović-Zarkula, Branimir Todorović, Lazar Stojković, Željko Džunić
Visual anomaly detection is important task in many areas such as industrial quality inspection, surveillance, medical diagnostics or healthcare. In these areas, the aim is to detect and recognise anomalous behaviour, or object defects by analysing images. Despite the huge number of images that should be analysed, the task is usually performed manually and therefore solutions which enable automation of such processes attain high attention. In this contribution, we are concerned with automatic detection of train brake defects from the images collected on the railway routes. The proposed approach is based on deep convolutional neural networks (CNN) in detection of both known and unknown brake defect types.
For the detection of specified known types of defects, for which representative examples exist in the training set, we have used supervised learning approach. We have created hierarchy of CNNs which performed different tasks: from initial classification of brakes and classification of known defect types, till detection and segmentation of brake areas to enable evaluation of brake detrition. By using hierarchical decomposition of the problem, we have enabled different CNNs to specialize for a specific problem. To enable detection of unknown types of defects, for which we did not have examples, we have applied unsupervised learning using our modification of autoencoder convolutional neural network. The network was adversarial trained only on images of brakes without defects. Based on the learned internal representation of brake, such convolutional autoencoder was able to “reconstruct” it when the faulty brake was presented as the input. Experiments using real-world data have confirmed good performances of the proposed approach in visual defect detection, as well as possibility to generalize it to other visual anomaly detection problems.
Sequence prediction using Recurrent Neural Networks with External Memory
Branimir Todorović, Miomir Stanković
Sequence prediction problem can be defined as prediction of the next element in the sequence given the previous elements and exogenous inputs. It is fundamental problem in many intelligent systems applications: adaptive control and robotics, natural language processing, symbolic reasoning and inference, to mention just a few. Recurrent neural networks (RNN) are models of dynamic systems which have been used for sequence prediction with certain success. In each time step RNN combines the current exogenous input and its previous state, given as the vector of the neuron activities, to predict the output. The problem of vanishing and exploding gradients prevents simple RNN to learn long range dependencies. This problem is partially solved by applying Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. Here we consider improving the effectiveness of recurrent neural networks by augmenting the various architectures of RNNs with external memory, which stores the previous activities of recurrent neurons. The output prediction of the model is obtained from its current state, given as a vector of recurrent neuron activities, which depends on current exogenous input and the content of the memory. Augmented architectures consistently improved the RNNs’ performance in the sequence prediction tasks.

Track B - Data Science in Critical Infrastructures

Voltage Stability Prediction Using Active Machine Learning
Vuk Malbaša
An active machine learning technique for monitoring the voltage stability in transmission and distribution systems is presented. It has been shown that machine learning algorithms may be used to supplement the traditional simulation approach, but they suffer from the difficulties of online machine learning model update and offline training data preparation. We propose an active learning solution to enhance existing machine learning applications by actively interacting with the online prediction and offline training process. The technique identifies operating points where machine learning predictions based on power system measurements contradict with actual system conditions. By creating the training set around the identified operating points, it is possible to improve the capability of machine learning tools to predict future power system states. The technique also accelerates the offline training process by reducing the amount of simulations on a detailed power system model around operating points where correct predictions are made. Experiments show a significant advantage in relation to the training time, prediction time, and number of measurements that need to be queried to achieve high prediction accuracy.
An ICT framework for addressing sustainable development
Vladimir Zdraveski
Sustainable development goals are essential to ensure the future of humanity, hence how to assess and govern sustainability is the central (crucial) challenge facing our society. One of the central challenges facing our society, known also as tragedy of commons, is how best to govern and manage natural resources (or the entire planet) used by many individuals in common. We suggest a “glocal” governance model, characterized by both local and global considerations, in the context of economic, ecological, and social aspects of sustainable development. The model addresses global challenges on a local level — the level of a municipality/city — and offers new forms of citizen engagement and more flexible and integrated modes of governing. The ICT platform addresses short-term and long-term global social-ecological challenges on local level and enables ordinary citizens and policy makers to shape our future. It allows vertical and horizontal linkages among diverse stakeholders and their contributions to nested rule structures utilized at the operational, collective, and constitutional levels. Thus, the ICT platform ensures a paradigm shift in approaching global challenges for the advancement of our society.
The role of industry, occupation, and location specific knowledge in the survival of new firms
Cristian Jara-Figueroa
How do regions acquire the knowledge they need to diversify their economic activities? How does the migration of workers among firms and industries contribute to the diffusion of that knowledge? Here we measure the industry, occupation, and location specific knowledge carried by workers from one establishment to the next using a dataset summarizing the individual work history for an entire country. We study pioneer firms–firms operating in an industry that was not present in a region–because the success of pioneers is the basic unit of regional economic diversification. We find that the growth and survival of pioneers increase significantly when their first hires are workers with experience in a related industry, and with work experience in the same location, but not with past experience in a related occupation. We compare this results with new firms that are not pioneers and find that industry specific knowledge is significantly more important for pioneer than non-pioneer firms. To address endogeneity we use Bartik instruments, which leverage national fluctuations in the demand for an activity as shocks for local labor supply. The instrumental variable estimates support the finding that industry related knowledge is a predictor of the survival and growth of pioneer firms. These findings expand our understanding of the micro-mechanisms underlying regional economic diversification events.
Analysis of movie big data and calculating the prediction of the popularity and profit
Drazen Draskovic, Luka Popovic, Bosko Nikolic
Machine learning uses automated algorithms, based on learning from a set of data and brings conclusions much faster than scientists would do using conventional analytical modeling. Predictive analysis attempts to predict the behavior of a group of users. The aim of this research was the analysis and processing of movie data with prediction of popularity and movie
profit. Supervised learning techniques were used. The first step was to collect data from as many available sources as possible on the Internet (IMDB, RottenTomatoes, etc), using its own implementation of web crawler and web scraper. The second step was to bring this data set into a format suitable for processing and "clean" it from duplicate records, missing or
incorrect values. The third step was the analysis of a data set, modeling and implementing a system that generated an output based on input parameters. During the analysis, two variants of the linear regression algorithm and the k-nearest neighbors algorithm were used. The set of data is divided into three groups: training data for algorithm training, test data for validation of trained algorithms, and final test data. By applying reliazed ML algorithms, the analysis gave an average score of 77.9% to 83.74% for predicting the popularity of movies and about 60% in predicting profit based on the movie's budget, popularity and the number of respondents.
(Big) Data in Schools - how to research on digital education in a data-protective way?
Lindner, Martin
Digital education is a wide-spread challenge for all schools at the moment. E.g., the movement "Digitalisierung der Schule" in Germany releases millions of Euro to bring more schools to use digital media and internet-based resources in a way all classrooms will be influenced by this (new) way of teaching.
We are convinced, that a new way of teaching needs a profound research. Thus, we are researching on the use of digital media in classrooms since 1994. The research includes online games (learning games), simulation software, orientation software, internet-based research tools and a use of a data bank for methods created by us to support outdoor learning activities. A special field of us is the research on refugee students´ learning also in Greece, which needs another level of protection against abuse, kidnapping or any illegal use of the knowledge we have gained.
The problem occurs during the research and during the international presentation of the data. How can we protect the data of the students, like faces, voices, information about the place, the time and the circumstances of our research?
On various levels we designed methods of data protection. These includes the suppression of the recording of the name, place and region of the schools, the prevention of taking pictures, the anonymization of videos, the use of abstract codes in pre-post-Design and so on. The methods of our research and the methods of our data protection will be shown at the poster. The coverage of these methods will be discussed.
An energy efficient distributed source coding scheme for sensor networks
Velimir Ilic, Elsa Dupraz, Fangping Ye
Handling the large amount of data is one of the most challenging tasks in sensor network design. This work presents a distributed source coding scheme that strongly reduces the amount of data transmitted by sensors and is realized from advanced error-correction codes. Potential applications include seismic sensing and real-time video surveillance.
Distributed Power System State Estimation Algorithms Based on the Belief Propagation
Mirsad Cosovic, Dejan Vukobratovic
We present a novel distributed Gauss-Newton method for the non-linear state estimation (SE) model based on a probabilistic inference method called belief propagation (BP). The main novelty of our work comes from applying BP sequentially over a sequence of linear approximations of the SE model, akin to what is done by the Gauss-Newton method. The resulting Gauss-Newton belief propagation (GN-BP) algorithm is the first BP-based solution for the non-linear SE model achieving exactly the same accuracy as the centralized SE via Gauss-Newton method. Due to the sparsity of the underlying factor graph, the GN-BP algorithm has optimal computational complexity (linear per iteration), making it particularly suitable for solving large-scale systems.
In addition, we propose a fast real-time state estimator based on the BP algorithm for the power system SE. The proposed estimator is easy to distribute and parallelize, thus alleviating computational limitations and allowing for processing measurements in real time. The presented algorithm may run as a continuous process, with each new measurement being seamlessly processed by the distributed state estimator. In contrast to the matrix-based SE methods, the BP approach is robust to ill-conditioned scenarios caused by significant differences between measurement variances, thus resulting in a solution that eliminates observability analysis.
Finally, we propose a linear complex SE model suitable for processing large-scale data in electric power systems observable by phasor measurement units. The proposed algorithm is placed in the non-overlapping multi-area SE scenario without a central coordinator. The communication between areas is asynchronous, where neighboring areas exchange only ''beliefs'' about specific state variables. Presented architecture, in the extreme case, can be implemented as a fully distributed and results in substantially lower computational complexity compared to traditional SE solutions. We discuss performances of the BP-based SE algorithms using power systems with 118, 1354 and 9241 buses.
Joint content placement and lightpath routing and spectrum allocation in content provisioning with cloud migration
Branka Mikavica, Goran Marković, Aleksandra Kostić-Ljubisavljević
Provisioning of high bandwidth demanding contents, such as video on demand, High Definition Television (HDTV), real-time video, online gaming, file sharing and cloud computing, cause ever-increasing growth of Internet traffic. In general, participants in content provisioning process include content providers, service providers and end users. The participants may perform vertical integration in order to enhance system performances. The demand for content provisioning varies significantly during a day. In order to mitigate problems of under-utilization or over-utilization of self-owned resources, cloud providers are involved in the content provisioning process. Enormous traffic volumes require implementation of network technologies capable to support high bandwidth demanding requirements.
Elastic Optical Network (EON) technology is considered as a promising solution for building effective and cost-efficient transport network. The process of content provisioning with cloud migration over EON comprises the establishment of lightpaths and involves the problem of Routing and Spectrum Allocation (RSA). In this paper, we address the problem of content placement into cloud data centers along with lightpath routing and spectrum allocation in EON. A novel Mixed Integer Linear Programming (MILP) model is proposed to solve the considered optimization problem. The relevance of each criterion in the objective function is determined using Multiple Attribute Decision Making (MADM) methods. The proposed model achieves an appropriate content placement and determines the lightpaths depending on a given bandwidth demand for content provisioning, thus optimizing cloud migration, spectrum utilization and lightpath length, concurrently.
Combinatorial optimization and Metaheuristics for Big Data
Nenad Mladenovic, Dragan Urosevic, Dusan Dzamic
Research in area of Combinatorial optimization and Operations research
(i) Discrete location problems. In this area of operations research thereis still a lot of room for new models that follow and cover more realistic circumstances. Models could take into account, for example, customer preferences, the geographical facility location restrictions, restrictions on capacity etc. New models need to be supported with improved numerical
methods. We have idea to improve existing heuristics that are based on some metaheuristic rules. Such methods are for example VNS, genetic search and simulated annealing.
The hub location problems will be one of the topics of our research, due to their numerous applications in designing transportation and telecommunication networks (computer and satellite networks, logistical systems, airline industry, postal delivery systems, cargo transportation, etc.).
(ii) Vehicle routing problem. In the vehicle routing problems we deal with several salesmen (vehicles) and some additional restrictions on valid routes: the time windows for customer service, time or distance constraints of each vehicle route, vehicle capacities, schedule drivers, and others. We will pay more attention to asymmetric VRPs, since they are more realistic in urban areas.
(iii) Analysis of social and other complex networks. The goal is to analyze large networks in order to detect structure of network, especially to divide network into clusters.
(iv) Problems related to railway infrastructures. It is essential that the railway infrastructures are kept in good conditions.
Among the many problems related to railway infrastructures, we will address the following: preventive maintenance planning; the joint scheduling of maintenance and spare parts by determining, for each component of the system, the optimal ordering time and the optimal preventive replacement time; the classification of equipment in maintenance process; and the track maintenance scheduling problem by designing the methods that will help to improve solution speed and optimality.
Distributed second order methods with variable number of working nodes
Natasa Krklec Jerinkic, Dusan Jakovetic, Natasa Krejic, Dragana Bajovic
We consider distributed optimization problem formulations suitable for large scale and Big Data optimization, where nodes in a connected network collaboratively minimize the sum of their locally known convex costs subject to a common (vector-valued) optimization variable. In this work, we present a mechanism to significantly improve the computational and communication efficiency of some recently proposed first and second order distributed methods for solving such problems.
The presented mechanism relaxes the requirement that all nodes are active (i.e., update their solution estimates and communicate with neighbors) at all iterations k. Instead, each node is active at iteration k with probability pk, where pk is increasing to unity, while the activations are independent both across nodes and across iterations. Assuming strongly convex and twice continuously differentiable local costs and that pk grows to one linearly, both the distributed first and second order methods with the idling schedule exhibit very similar theoretical convergence and convergence rate properties as if all nodes were active at all iterations. Numerical examples demonstrate that incorporating the idling schedule in distributed methods significantly improves their computational and communication efficiencies.
Self-healing distribution network automation procedures based on Markov Decision Process
Aleksandar Janjic, Lazar Velimirovic, Jelena Velimirovic, Zeljko Dzunic
Automated fault location and isolation processes in power distribution networks are based on intelligent sensors telemetered values. After the initial guess, the system is energizing section by section until the protective relay trips the feeding circuit breaker and the faulty section is identified. However, the accuracy and plausibility of IEDs are usually not taken into account. Exclusively data-driven approaches to fault isolation systems may lead to incorrect and uninformed decisions as they do not incorporate useful information from the engineering and physical models. Therefore, the probability of possible states and actions based on the equipment condition should be modelled. The ongoing study of MDP based methodology will reduce the possible number of trials by the optimization of switching policy. The condition of switching devices is the part of the model as well, with the diversification of normal and abnormal operation conditions. A hybrid approach that uses real-time data form the network and meteorological data, in conjunction with basic physical and engineering constraints, has the promise to overcome these limitations and can lead to significantly improved decision capabilities.
That way, fault analysis and identification can be carried out quickly for quick restoration of the system. This algorithm can be incorporated in Advanced Distribution Management Systems, with the required set of input data, including the estimated times of crew travelling for manual switching, conditional probabilities of intelligent sensors, times of manipulation and the estimated non supplied energy. The main benefit of the proposed methodology is the modelling of specific problems related to the particularities of a distribution network that can be represented by state transition probabilities, including:
– the heterogeneity of the line and cable types (conductor section, electrical characteristics…)
– the radial structure of distribution feeders
– the tapped loads
– the scarce data available about the loads and state of the network
This methodology can be used for radial networks with distributed generation and loop networks, which will be the focus of our future researches.
A Flood Monitoring Tool for Urban Areas Using Satellite, Weather and Social Data
Stelios Andreadis, Anastasia Moumtzidou, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris
Large streams of open data become available on a daily basis and are expected to support a number of public bodies and water authorities in their safety policies and strategies. At the European level, more than 2.5 million of Sentinel-1A products are published online from the European Space Agency, corresponding to 3 million GB of data and 350,000 products.
Copernicus data have already paved the way to monitor changes on Earth, using Sentinel data, aiming at efficient and emergency management services and protection of critical infrastructure. Open weather forecast data are also available from meteorological institutes, adopting the full, free and open access policy of the Copernicus programme. In our tool, these two main sources of weather and satellite data are supported by citizen observations, as they are reporting short messages, images and spatiotemporal information on social media platforms, such as Twitter. Our technologies involve water body detection from satellite data, event detection from the available sources of data and their positioning on a map
when spatiotemporal information is available or automatically estimated. Concepts are extracted from satellite or social media images and topics are created by grouping similar Twitter posts. Furthermore, our tool automatically identifies whether an image refers to a flood or not, using computer vision state-of-the-art techniques, based on deep convolutional
neural networks. Our tool also offers animation of the user communities to identify authorities in the social network of user interactions. The extracted knowledge is represented in a semantic way that allows for generating notifications and making decisions in an effective and efficient way. These technologies are very useful for water authorities and civil protection agencies in their need to monitor a flood event and to have a holistic view of an area at the preparedness, response and recovery stages.
Apples, grapes, transport, air, beer, wine, meat,...IoT and data
Nenad Gligoric, Senka Gajinov, Dejan Drajic, Srdjan Krco
New technologies, i.e. Internet of Things (IoT) offers indefinite number of opportunities in different segments of industry, agriculture, city governments and infrastructure, environment monitoring, transportation, home automation, manufacturing, healthcare, etc. In the recent years, we are witnessing increased urban growth and migration waves towards the cities around the globe. The cities are experiencing pressure on their infrastructures that have to cope with continuously increasing demand for supply of energy, water, pollution, noise, social, transportation, healthcare, safety, etc. IoT, sensing technologies, big data analytics, machine learning, advanced data analytics have an important role in creating solutions for smart cities, thus reducing costs, improving the quality of services delivered to citizens and overall quality of life in the cities, thus making cities of the future safer and more efficient. On the other side, combination of the same technologies can drive move from
agriculture based on experience to agriculture based on "numbers", thus helping agriculture industry to optimize food production and produce high quality food for increased global population in a sustainable manner. From the industry point of view, the manufacturing business and processes are in full transformation, because of increasing automation, digital
transformation, evolving manufacturing technologies, connecting digital and physical environment by using IoT. IoT also goes in hand with term “smart”, but there is no unique definition of “smart” meaning, but rather there are different interpretations. One of the widely accepted Smart City definition is: “A smart city is a municipality that uses information and communication technologies to increase operational efficiency, share information with the public and improve both the quality of government services and citizen welfare”. The scope of this paper is to present novel IoT services for smart cities, smart agriculture and smart industry, or how to use data in order to improve the efficiency of production and digitization.
Learning Error Correction Algorithms
Xin Xiao, Srdan Brkic, Bane Vasic, Shu Lin
In the early 1990s, a spike of research occurred in neural networks with the functionality of error correction decoding algorithms. A common feature of these approaches was that the network learns “from scratch” how to classify the channel output words and build the decision region for each codeword. To do so, a training set had to include all codewords in a code, which made the approach intractable except for very short codes.
We present a Deep Neural Networks (DNN) to improve iterative decoding algorithms for low-density parity check (LDPC) codes. These DNNs are constructed based on Tanner graph, with various structures including Multi-Layer Perceptron Neural Network (MLPNN) and Recursive Neural Network (RNN). Another feature is that the activation functions over hidden layers
preserve the symmetry conditions. By properly applying learning methods such as stochastic gradient descent (SGD) and Adam to set weights and bias makes it possible to improve the decoding performance.
Due to a symmetry constraint on the neural weights, the training can be performed on a single codeword, and lead to an improved convergence of iterative decoders. Since the channel is output symmetric, as long as the weight matrices are symmetric, the update functions remains symmetric and the weight updates can be performed on the all-zero codeword by changing only the noise realizations. This is the key differentiator from the existing methods that require entire codeword spaces as training sets, which are computationally intractable. Due to this simplification, we can now afford very large training sets of error patterns. By training on error patterns, the network effectively learns the decision boundary of the
all-zero codeword in N dimensional space and approximates this boundary via the weight matrices at different layers of the network.
Big Data application in diagnostics of Electric Power Systems
Mileta Žarković, Phd, Teaching and Research Assistant at School of Electrical Engineering, Belgrade University, Zlatan Stojković, Vice dean for science, PhD, full professor at School of Electrical Engineering, Belgrade University
The power transformer is the most important and most expensive elements of’each electric power system. This research project presents intelligent algorithms which are applied on power transformer monitoring data.
Algorithms explain how to use a big data obtained during exploitation of power transformers. Database is formed from measurements of: polarization index, dielectric loss factor, idle current, short-circuit impedance and relative phase difference of resistance at the different positions of the voltage regulator. The first algorithm applies unsupervised machine learning in the domain of artificial intelligence. Such an algorithm classifies data and assigns transformers to groups with similar properties, according to the maintenance priority and probability of failure. Second algorithm applies artificial neural network (ANN) to predict the exploitation age of the power transformer based on the measurement of the monitoring. The exploitation age is compared with actual age and premature aging of the power transformer is indicated. The application of ANN will indicate accelerated aging and the need for unloading and frequent maintenance of endangered transformers. For critical values of input parameters ANN can predict year of end-of-life for observed power transformer. Both algorithms are trained on the data from the electric power industry. The engineering application results demonstrate the effectiveness and superiority of the machine learning and improve the diagnostics for power transformer. Algorithms help engineers in making a proper and timely decision. The application of such algorithms does not require especially experience and knowledge about power transformers. Also, the applications of the algorithms automate the analysis of monitoring results and shorten the time of making the decision on the maintenance priority. Similar technique can be applied for asset management for entire electric power system.
Python Software Transactional Memory: A Tool for Data Science
Branislav Kordic, University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Marko Popovic, University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Ilija Basicevic, University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Silvia Ghilezan, University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Miroslav Popovic, University of Novi Sad, Faculty of Technical Sciences, Novi Sad
Nowadays, Python, MATLAB, and R are the most popular programming languages for data science. While R's functionality is developed with statisticians in mind – statistical analysis, data visualization, and predictive modeling – MATLAB supports powerful matrix handling aimed for numerical calculations. Python with its flexibility and powerful features incorporated in libraries suitable for both statistical analysis and numerical calculations is the language of choice for the academia scientist and researchers. On the other hand, data processing in a data science system is naturally concurrent and often it is compute intensive as well. This makes the Python Software Transactional Memory (PSTM), and accompanying components, interesting and relevant for data science. PSTM is based on Python’s multiprocessing packages and it is supported on Python 2.7x and Python 3.5x versions. Transaction based execution enables easy and quick integration within existing solutions with the main goal to support concurrent execution in a lock-free manner. The correctness of PSTM is formally verified against deadlock freeness, safety, liveness, and reachability properties using industrially proven tools such as UPPAAL tool, which is based on timed automata formalism, and PAT tool which uses CSP models. PSTM’s performance is benchmarked on concurrent data structures such as queue and list, and on Simple Bank program which models a simple bank transaction system. Besides benchmark applications, prelaminar results of PSTM integration in a real world application, such as a computational-chemistry simulation program DEEPSAM is, are positive and encouraging. Research towards Distributed PSTM (DPSTM) is an ongoing task. The goal of DPSTM is to enable utilization of emerging cloud technologies like Amazon Web Services and etc. So far, various transaction scheduling algorithms and a simulation of a DPSTM prototype have been developed. Future work includes development of DPSTM and possible application of data science in distributed transaction scheduling.
Distributed Transactional Memory: A Foundation for Data Science Systems
Costas Bush, Maurice Herlihy, Miroslav Popovic, Gokarna Sharma
Data Science is the study of generalizable extraction of data, which requires a system infrastructure for distributed data processing. Dataflow model seems to be a natural and faithful representation of such an infrastructure, since input data items to be processed at a network node, reside on some other nodes, and new data resulting from the processing may be needed by yet other network nodes. Moreover, the data processing at various nodes execute concurrently. Distributed Transactional Memory (DTM) makes a perfect match for this context, since it supports dataflow based execution of transactions in a network.
Memory transactions relate to regular data objects. Therefore, efficient schedulers for DTM can also be used for efficiently scheduling transactions for regular data in the distributed setting. The objective is to design efficient algorithms that minimize the makespan of the transaction execution and also minimize the communication cost in the network. Part of the research is to explore impossibility results due to the data dependencies on the makespan and communication cost. Known impossibility results for transactional memory can apply also to data transactions, where it is not possible to minimize simultaneously the communication cost and the makespan in the general case. On the positive side, there are several interesting special case network topologies where the communication is minimized at the same time with the makespan, such as cliques, butterflies, line graph, grid graphs, and cluster graphs. The positive results can help to obtain more efficient transaction schedules for data science problems. In conclusion, this proposed work relates distributed data science problems to scheduling problems in distributed transactional memory, where known impossibility and positive results can apply to regular data transactions as well. Future work includes to identify unique aspects of the data science problems and also consider dynamic scheduling problems.
Complex-event-processing for emergency management in critical infrastructures
Nikola Tomasevic, PhD (Institute Mihajlo Pupin), Valentina Janev, PhD (Institute Mihajlo Pupin), Sanja Vranes, PhD (Institute Mihajlo Pupin)
Critical infrastructures (CI) are difficult to handle due to their complexity, size and the number of stakeholders involved. During emergency situations (e.g. fire or terrorist attack), a CI operator in the control room is faced with a flood of information coming from different sensors and legacy monitoring systems. Since in these situations, time is critical and
the operators are under a great deal of pressure, a holistic approach to process such large amount of incoming data is needed, as part of the Recommendation and Decision Support System (RDSS) that helps emergency managers to take correct and timely decisions. One of possible ways to provide an adequate RDSS support to the operator has been developed by the
Institute Mihajlo Pupin, which is based on an intelligent, event driven layer that sits on top of the legacy CI monitoring system. Powered by the complex event processing capabilities, this layer is capable of processing the data acquired from different sources, conducts the situation and risk assessment, and reacts accordingly, either automatically or via recommendations proposed to emergency personnel. To validate the proposed approach, an event-driven RDSS was deployed on an airport use case (Nikola Tesla airport in Belgrade), as one of the most complex CIs.

Track C - Biomedical Informatics

Somatic Variant Calling Benchmarking
Luka Topalovic
The increased number of sequenced cancer genomes since the completion of the human genome project, and the importance of correctly identifying somatic mutations, which can influence treatment or prognosis, is driving forward the development of novel somatic variant calling tools (somatic callers). A lack of best practices algorithm for identifying somatic variants, however, requires constant testing, comparing and benchmarking these tools. The absence of truth set further hinders the effort for the evaluation. By comparing widely used open source somatic callers, such as Strelka, VarDict, VarScan2, Seurat and LoFreq, through analysis of in-house generated synthetic data, we found complex dependencies of somatic caller parameters relative to coverage depth, allele frequency, variant type, and detection goals. Next, we normalized and filtered the output data such that it can be appropriately compared to the truth set. The acquired benchmarking results were automatically and efficiently structured and stored. All of the tools used for the analysis have been implemented in Common Workflow Language which makes them portable and reproducible.
Accurate cancer cells recognition using neoantigen analysis of Next Generation Sequencing data
Milan Domazet, Vladimir Kovačević
Neoantigens are proteins presented on the surface of cancer cells that are recognized by the immune system. Multiple novel therapeutic approaches involve identifying neoantigens and using them to trigger immunity-induced tumor regression. Seven Bridges has developed an analysis for neoantigen discovery using Next Generation Sequencing data, which processes tumor-normal pairs of whole exome sequencing samples and tumor gene expression data in order to output candidate proteins for neoantigens. Predictions can help training the immune system which would destroy the cancerous cells providing a novel approach in cancer treatment.
Integration of Domain Knowledge into Machine Learning Algorithms – Applications in Healthcare
Sandro Radovanović
Machine learning algorithms are solely based on data in order to induce rules and models for prediction. However, there is a huge amount of knowledge hidden in various formats which can be exploited in order to gain better performance. Therefore, this talk is aimed at presenting possibilities of integration of domain knowledge into machine learning algorithms with applications in healthcare. Ideas and initial results in the generation of virtual examples, feature extraction, and feature selection will be presented.
Infant Neuromotor Development and Childhood Problem Behavior – sensitivity to non-ignorable missingness
Emin Tahirović
In Generation R, a population-based cohort in the Netherlands (2002–2006), the neuromotor development of 4006 infants aged 2 to 5 months was evaluated by using an adapted version of Touwen’s Neurodevelopmental Examination (tone, responses, and senses and other observations). Information on one or more assessments of child behavioral problems at age 1.5, 3, 6 and 10 years was available in 3,474 children (86.7% of 4,006).
In the original study non-response was assumed to be missing at random and was adjusted for by multiple imputation techniques. In this talk, I present results of the sensitivity analysis done with respect to non-ignorable non-response. Due to considerate non-response the conclusions might have important implications with regards to interpretation of the results and their implications for clinical practice.
An organizational genetics approach for predictive analytics and motif discovery across large dynamic spatiotemporal networks
Youngjin Yoo, Sunil Wattal, Zoran Obradovic, Rob J. Kulathinal
We are developing several methods based on the use of evolutionary social ontologies that predict the emergence of system-level behaviors within large volumes of digital trace data. We use analytical techniques developed in evolutionary genetics and systems biology to: (1) characterize a stream of digital trace data from complex socio-technical systems with finite genetic elements, (2) predict the behavior of socio-technical systems through the development of high throughput methods including those that integrate graph compression with Gaussian conditional random fields (GCRFs), Generalization-aware structured regressions, as well as multivariate time-series, and (3) explore the impact of mutational input, gene flow, and recombination on “behavioral genes” in the evolution of socio-technical systems. We will generalize our model to be used for other types of massive digital trace data. Our work will allow different fields to study the emergence of complex systems behavior via the interaction of granularevents.
Functional adaptive landscape across Great Ape genomes
Victorya Richardson, Rob J. Kulathinal
Male reproductive genes encode among the fastest evolving and most highly adaptive classes of proteins across a diversity of taxa including the Great Apes. Although it has been hypothesized that sexual selection drives this rapid divergence through such processes as sperm competition and gamete selection, we still know relatively little about the functional networks and constraints involved. Furthermore, it remains unclear whether adaptive genes are co-evolving together or are evolving in isolation. Here, I apply a systems evolutionary genomics approach to identify specific protein-protein interaction subnetworks significantly enriched in positively selected genes. I estimate genome-wide selection coefficients for all proteins in gorillas, bonobos, and chimpanzees using humans as an outgroup and map genes across a series of constructed male reproductive networks to identify whether genes co-evolving under positive directional selection or evolving in isolation. Preliminary results reveal that positively selected genes are found across dispersed testis-specific subnetworks. Furthermore, eigenvector centrality and selection coefficient are negatively correlated, suggesting that adaptive genes are less constrained and are not co-evolving in concert. Overall, this approach provides a functional landscape of adaptive networks of male reproductive genes across our most closely related species.
Automated IVUS contour detection using normalized cuts
Branko Arsić, Boban Stojanović, Nenad Filipović
A broad range of high impact medical applications involve medical imaging as a visualization tool of body parts, tissues, or organs, for use in clinical diagnosis, treatment and disease monitoring. Intravascular ultrasound (IVUS) is a medical imaging technology which uses ultrasound waves to visualize blood vessels from the inside out thus representing a valuable technique for the diagnosis of coronary atherosclerosis. IVUS provides a unique method for studying the progressive accumulation of plaque within the coronary artery which leads to hart attack and stenosis of the artery. Besides plaque geometry and morphology, this technique also provides information concerning the arteries lumen and wall. The detection of lumen and media-adventitia borders in IVUS images represents a necessary step towards the geometrically correct 3D reconstruction of the arteries. In this paper a fully automated technique for the detection of media-adventitia border in IVUS images is presented. Image segmentation approach is performed to find a partition of IVUS image into regions. The local similarity measure between pixels is computed in the intervening contour framework using peaks in contour orientation energy. When the similarity matrix is obtained, the spectral graph theoretic framework of normalized cuts is used to find image partitions. Compared to those of manual segmentation the proposed method performs reliable automated segmentation of IVUS image. Our segmentation approach is experimentally evaluated in large datasets of IVUS images derived from human coronary arteries.
Evidence of recombination in Tula virus strains from Serbia
Valentina Cirkovic, University of Belgrade Faculty of Medicine, Belgrade , Marina Siljic, University of Belgrade Faculty of Medicine, Belgrade, Gorana Stamenkovic, University of Belgrade, Institute for Biological Research ‘‘SinisaStankovic’’, Belgrade, Ana Gligic, Institute of Virology, Vaccines and Sera Torlak, Belgrade, Luka Jovanovic, Institute of Oncology and Radiology of Serbia, Maja Stanojevic, University of Belgrade Faculty of Medicine, Belgrade
Tula virus (TULV) belongs to Hantaviridae family, with negative sense RNA genome consisting of three segments: small (S), medium (M) and large (L). Unlike other hantaviruses TULV is known to infect several rodent species of the genus Microtus species and other such as Laguruslagurus. Considering the segmented nature of TULV genome the evolutionary history of TULV has been shaped by different molecular mechanisms, including genetic drift and reassortment. In addition, phylogenetic analysis of TULV genetic variants from Slovakia revealed the existence of recombination within the S segment. Our aim was to investigate phylogenetic clustering of the isolates coming from Serbia and examine possible existence of recombination. Sequence alignments, for both partial L and S segments, were accomplished by using CLUSTAL W implemented in MEGA 5.1 software package. Aligned sequences were examined with jModeltest 0.1.1 software to determine the best fit nucleotide model of substitution. Phylogenetic analysis was performed using PHYML 3.0 software package using the maximum likelihood (ML) method. In order to analyze possible recombination events a set of 22 S segment TULV sequences from different geographical regions were aligned, including two TULV strains from Serbia. Sequence alignment was analyzed with Bootscan, as implemented in Simplot. Phylogenetic analysis of both L and S segment sequences is suggestive of geographically related clustering, as previously shown for majority of hantaviruses. Exploratory recombination analysis, supported by phylogenetic analysis, revealed the presence of recombination in the S segment sequences from Serbia, resulting in mosaic-like structure of TULV S segment similar to the one of Kosice strain. Although recombination is considered a rare event in molecular evolution of negative strand RNA viruses, obtained molecular data in this study support evidence of recombination in TULV, in geographically distant regions of Europe.
Functional annotation of amino acid substitutions in non-conserved regions of epigenetic factors mutated in blood malignancies
Branislava Gemović, Radoslav Davidović, Vladimir Perović
Mutations in epigenetic factors have a key role in pathogenesis and progression of blood malignancies. There are thousands of blood malignancy-related mutations in approx. 100 epigenetic factors stored in COSMIC database. Nevertheless, majority of new patients bring new set of variants with unknown functional effects, which need to be annotated. Many bioinformatics tools perform this task and PolyPhen-2 and SIFT are the most commonly used. SIFT is based on sequence homology, presuming that evolutionary conserved amino acids are functionally important. These evolutionary presumptions are also important for PolyPhen-2 predictions, although it uses many additional features. But, our previous research showed that more than 50% of gene variants in epigenetic factors are positioned in non-conserved regions (nCRs), which make them hard to annotate with homology-based tools. The aim of this study was to develop sequence-based model for functional annotation of amino acid substitutions (AAS) in nCRs of epigenetic factors mutated in blood malignancies. Our dataset encompassed 2072 AAS in 18 proteins. We developed new features to describe AAS and used machine learning (Naïve Bayes) to generate models for their annotation. Features were based on: 1) protein sequence coding using all amino acid descriptors archived in AAindex; 2) Fourier transform of decoded protein sequences; 3) Scores describing each AAS calculated based on the difference between wild type and variant sequence. The model was created for each protein, while performance was measured on overall data. New model showed 70% accuracy. It outperformed SIFT and PolyPhen-2 on measures of accuracy, weighted F and MCC, though it had lower AUC value compared to PolyPhen-2. This research suggests that features based on Fourier transform of protein sequence can improve prediction power of bioinformatics tools for functional annotation of AAS.
Subtle signals of proteolytic processes in circulation transcriptome of patients after myocardial infarction indicate the ventricular remodeling outcome
Ivan Jovanovic, Tamara Đurić, Maja Živković, Milica Dekleva, Nataša Markovic Nikolic
Prolonged duration of left ventricular remodeling (LVR) after myocardial infarction (MI) leads to progressive alteration in the structure/shape/size and function of the heart representing maladaptive LVR (MLVR) that precedes development of heart failure which is life threatening. The aim of this study was to bioinformaticaly investigate the transcriptome in MI patients with/without MLVR, six months after the MI. Peripheral blood mononuclear cells of patients who suffered MI (9 with MLVR/12 without MLVR) were sampled 6 months after the insult. MLVR was defined as progressive LV dilatation with LV diastolic volume increase (>20%) together with preserved or declined global LV ejection fraction at 6 months follow up. Transcriptome data were obtained by employing IlluminaiScan microarray technology. Gene Set Enrichment Analysis (GSEA) was used to detect concordant differences in Gene Ontology biological processes gene sets, between patients with MLVR and without MLVR. We adopted default GSEA settings and the significance of an enrichment score was evaluated by a 1000 permutation test with respect to phenotypes. GSEA analysis revealed the ubiquitin dependent protein catabolic process via the multivesicular body (MVB) sorting as the top enriched in phenotype with MLVR (ES = 0.58, NES = 1.99, FDR q-value=0.32). The E3 ubiquitin ligase NEDD4 gene had the highest rank metric score in the gene-set. There are two closely related proteolytic processes crucial for normal cardiomyocyte physiology, going through MVB/lysosomal pathway, MVB sorting and autophagy. Besides ubiquitination of MVB cargo, NEDD4 also promotes biogenesis of autophagosomes. Increased numbers of autophagosomes are a prominent feature in heart failure. Excessive and long-term upregulation of autophagy could lead to destruction of essential proteins and organelles beyond a certain threshold leading to cell death. Identification of transcriptomic signals in circulation makes this approach applicable in the research of new therapeutics for the protection of post MI heart failure.
Bioinformatics for Microbiome
Jasminka Hasic Telalovic, Azra Music, Dzana Basic
Microbiome is collection of mostly bacteria, but also viruses and eukaryotes that inhabit the human body. Microbiome composition varies throughout body sites. Numerous studies have reported links between gut microbiome and different health conditions (autism, IDB, diabetes, anti-biotics usage, etc.). It is believed that further understanding of gut microbiome can help us improve wellbeing of individuals with health conditions but also shed light into how to improve people’s longevity. We started a Bioinformatics for Microbiome research group at the International University of Sarajevo (IUS). Initial ideas and support came from representatives of Boston area biotech company and members of Bosnia and Herzegovina diaspora in the USA. The group at IUS is led by Dr. JasminkaHasićTelalović. In this group we explore existing bioinformatics techniques and propose improvements so that taxonomic information of bacteria present in microbiome can be identified with high confidence. We use techniques for identification based both on a single gene (i.e. 16S ribosomal RNA) but also the whole bacterial genome (shotgun metagenomics). Upon identification of bacterial makeup of microbiome (by using bioinformatics tools), we apply data science techniques to the results to gain further understanding of studied problems. We are currently working on the following projects: -Relevance of gender in microbiome makeup (as gender is many times not considered in research, important issues arising from differences amongst genders are often overlooked in science) -Microbiome of rice (we are using metagenomics for this agriculture application as it enables more specific taxonomic and functional resolution) - Using nutrition to improve human microbiome -Microbiome of Autism Spectrum Disorder population in Bosnia and Herzegovina
Clustering Objects With Large Amount Of Missing Data
Tatjana Davidovic, Natasa Glisovic, Miodrag Raskovic
Machine learning and data mining algorithms are non-trivial processes of exploring the new facts and identifying helpful relationships or patterns in data. Therefore, they are frequently used for knowledge discovery in databases. Analysts using real-world databases or datasets constantly encounter data imperfection, especially in the form of incompleteness. Consequently, numerous strategies have been designed to process incomplete data, particularly imputation of missing values. Single imputation methods initially succeeded in predicting the missing values for specific types of distributions. Nowadays, the multiple imputation algorithms prevail as they are able to increase validity by minimizing the bias iteratively while requiring lees of prior knowledge of the distributions. The new trends in processing the incomplete data include the usage of metaheuristics and various hybrid methods. After carefully reviewing the-state-of-the-art literature on processing incomplete data, we propose new approaches that do not include missing data imputation. We use a distance that calculates over missing data within two metaheuristic algorithms for clustering. The distance function uses the representation of objects as the sets of propositional logic formulae and explores probability theory and Hamming distance. Among metaheuristics, we selected Variable Neighborhood Search (VNS) and Bee Colony Optimization (BCO). In addition, we experiment with various objective functions, including p-median and minimum sum of squares. Experimental results on University of California Irvine (UCI) datasets as well as on database of patients from the Clinical Center of Serbia illustrate the superiority of our methods over the existing ones with respect to clustering accuracy for either numerical or categorical attributes with the finite sets of possible values when the amount of missing values increases up to 90%.
Temporal Origin And Phylodinamic Complexity Of Hiv Sub Epidemics In Serbia
Luka Jovanovic, Marina Siljic, Valentina Cirkovic, Dubravka Salemovic, Ivana Pesic-Pavlovic, Jovan Ranin, Djordje Jevtovic, Maja Stanojevic
By the end of 2016 over 3500 cases of HIV infection have been reported in Serbia. The epidemic was first noted among drug users and heterosexuals, however, a shift in the dominant transmission route has been observed with the majority of new cases seen in men who have sex with men (MSM). Phylodynamic analysis represents a tool to obtain insight into transmission dynamic in different HIV sub-epidemics. AIM: The aim of the study was to investigate differences in HIV transmission dynamics related to the main transmission risk (MSM and heterosexual) in Serbia. MATERIALS AND METHODS: Upon HIV-1 pol sequencing of 385 samples collected in the period 1997-2015, detailed phylogenetic analysis was performed. Identification of transmission clusters, estimation of population growth (Ne) and effective reproductive number (Re) and time of most recent ancestor (tMRCA) was performed employing Bayesian and ML methods, by different software packages including Beast, R and Tracer. RESULTS: Four major MSM and two heterosexual clades were identified, comprising total of 38%sequences (147/385). The time of the most recent common ancestor for transmission network was estimated at 1992 , for the two MSM clusters of 15 and 11 sequences 2008 and 2005 respectively, for two heterosexual clades, subtype B and subtype C 1993 and 1989, respectively. Phylodynamicanalysis of 2/4 MSM clades showed initial steep exponential growth, stabilizing around 2010; the remaining 2 MSM clades were characterized by exponential growth in almost 2 logs during the whole analyzed period. In contrast, growth of two heterosexual clades was stationary-under 1 log. Estimation of Re by birth-death skyline plot among MSM and heterosexuals revealed Re>1 for the MSM clades and Re<1 for the heterosexual monophyletic clades. CONCLUSION: The results of this study show intensified HIV spread related to MSM transmission in Serbia,implying that MSMs are an important driving force of the local HIV epidemic.
Machine Learning Approach for Predicting Wall Shear Distribution for Abdominal Aortic Aneurysm and Carotid Bifurcation Models
Milos Jordanski, Milos Radovic, Zarko Milosevic, Nenad Filipovic, Zoran Obradovic
Computer simulations based on the finite element method represent powerful tools for modeling blood flow through arteries. However, due to its computational complexity, this approach may be inappropriate when results are needed quickly. In order to reduce computational time, in this paper, we proposed an alternative machine learning based approach for calculation of wall shear stress (WSS) distribution, which may play an important role in mechanisms related to initiation and development of atherosclerosis. In order to capture relationships between geometric parameters, blood density, dynamic viscosity and velocity, and WSS distribution of geometrically parameterized abdominal aortic aneurysm (AAA) and carotid bifurcation models, we proposed multivariate linear regression, multilayer perceptron neural network and Gaussian conditional random fields (GCRF). Results obtained in this paper show that machine learning approaches can successfully predict WSS distribution at different cardiac cycle time points. Even though all proposed methods showed high potential for WSS prediction, GCRF achieved the highest coefficient of determination (0.930-0.948 for AAA model and 0.946-0.954 for carotid bifurcation model) demonstrating benefits of accounting for spatial correlation. The proposed approach can be used as an alternative method for real time calculation of WSS distribution.
Improving Clustering Performance in Microbiome Studies
Tatjana Lončar-Turukalo, Nina Maljković, Sofija Panotović, Sanja Brdar
Background and aims: Analysis of the human microbiome variations between different body habitats and their interaction with humans at cellular and genetic level is relevant for understanding of microbiome related diseases. Prevailing technique for taxonomic identification in microbial communities is 16S rRNA sequencing. Taxonomy relies on grouping species with certain similarity level into operational taxonomic units (OTUs). Between sample comparisons are based on different beta diversity measures. This study evaluates performance of 24 beta diversity measures, ensemble clustering and semi-supervised approaches with dimensionality reduction in clustering microbiome samples. Methods: Data from "Moving pictures of the human microbiome" study include 1967 microbiome samples from oral, skin and gut sites of one male and female, sampled over 396 time points. 16S sequences preprocessing comprised: demultiplexing, removing primer, and quality filtering. Sequences were clustered into OTUs by taxonomy assigner- UCLUST with a similarity threshold of 97% against Greengenes reference database. OTUs biome table summarizes taxonomy of samples as observations counts per sample. Each of 24 beta-diversity measures was evaluated using respective distance matrix as input to spectral clustering with automatic determination of local scaling factor. Ensemble clustering approach over obtained individual partitions was assessed for performance improvement. For further enhancements semi-supervised kernel-learning algorithm was explored using selected beta diversity measures combined with dimensionality reduction t-Distributed Stochastic Neighbor Embedding. Clustering performance was evaluated using Adjusted Rand Index (ARI), and stability using 50 repeated runs of the clustering procedure with subsampling. Results: The best performing beta diversity measures combined with spectral clustering of samples according to body habitat and gender were: abundance Jaccard, Hellinger and Kulczynski distance with ARI 0.59±0.03, 0.58±0.02, and 0.57±0.02, respectively. Ensemble approach combining all beta diversity measures slightly outperformed individual approaches with ARI 0.62±0.02. Both Kulczynski and Hellinger affinity matrix combined with dimensionality reduction to 3 features, and semi-supervised clustering achieved ARI 0.84±0.02. Conclusions: Beta diversity measures which imply normalizing a taxonomic unit’s contribution with sample’s total observation counts perform better. The best results achieved with dimensionality reduction impose the need for exploiting sparsity of OTU matrices. Clustering showed perfect separability with respect to habitat, while gender-wise skin samples could not be perfectly resolved.
On clustering large biological networks into dense components
Dragan Matić, Milana Grbić, Aleksandar Kartelj, Savka Janković, Vladimir Filipović
Clustering large biological networks into smaller components may be of a great importance for discovering new properties of a specific biological structure. By such clustering, dimension of the structure is decreased, but useful information about particular biological functionalities can still remain in the obtained clusters. Network clustering can be considered as an optimization problem, where the clusters should be as dense as possible, which generally can prevent the loss of information. In our long-term research we focus on developing robust and efficient computational methods for clustering large metabolic and PPI networks. In our first approach, we partition an edge-weighted network into the so called k-plexsubnetworks. In a network, a k-plex represents a subset of n vertices where the degree of each vertex in the subnetwork induced by this subset is at least n − k. We present a heuristic method for solving the corresponding mathematical problem which implements the 1-swap based fast local search strategy and the objective function that takes into account the degree of every node in each partition. In the second approach, we develop another local search - based heuristic method to cluster metabolite and PPI networks into the so called highly connected components, by removing as few edges as possible. A component is highly connected if all its nodes have a degree greater than n/2, where n is the number of nodes in that component. Experimental results clearly indicate that our approaches, applied to various kinds of networks are highly competitive to existing methods, also discovering a plenty of useful biological information.
Efficient Supervised Dimensionality Reduction Through Simultaneous Minimization of Reconstruction Error and Classification Loss
Predrag Tadić, Nima Asadi, Zoran Obradović, Željko Đurović
Machine learning problems in which the number of features is much greater than the number of observed examples have been gaining in importance, in areas such as genomics and biomedical image analysis. For example, in functional magnetic resonance images, the features are the activity levels of tiny brain regions called voxels. Each experiment generates responses from tens of thousands of features, with at most hundreds of independent training examples. Overfitting is a major concern and choosing the most informative features is paramount. We adopt the feature synthesis approach (in contrast to feature selection methods, such as elastic net): we find a mapping of the original data to a lower-dimensional space and perform learning on the data thus obtained. Standard unsupervised methods (principal/independent component analysis), do not take into account the labels, thus producing features which are not necessarily suitable for classification or regression. A semi-supervised approach is to use the outputs of the hidden layers of convolutional neural networks as features. fMRI examples are scarce, so nets pretrained on natural images are typically used, but these have a different distribution than magnetic resonance images. The support vector decomposition machine (SVDM) combines the reconstruction and classification errors into a single objective. The advantage over other supervised methods, like Fisher’s linear discriminant, is that SVDM directly incorporates the hinge loss into the problem from the very beginning, rather than performing feature selection independently of the subsequent classification/regression. The original SVDM arbitrarily places constraints on the parameters to eliminate ambiguity in the optimization problem. Through a different set of constrains, we achieve a more effective initialization of parameters through singular value decomposition of the data matrix. This facilitates the numerical optimization procedure and makes the learnt dimensionality reduction mapping more interpretable and easier to apply to new (unlabeled) data.
Functional Annotation Of Proteins: Application On Mybl2, A Gene Involved In The Onset And Progression Of Cancer
Katarina Stankovic, Branislava Gemovic
All known functions of proteins are housed in the Gene Ontology (GO) database. They are represented as terms, which are hierarchically organized as a direct acyclic graph, into three sub-ontologies: Biological Process (BPO), Molecular Function (MFO) and Cellular Component (CCO). Yet, for a great number of proteins, functions are still unknown, including more than 6000 human proteins. Predicting new functions of proteins using computational tools enables filtering functions for experimental testing, which consequently leads to increase of efficiency and decrease of costs of research. The aim of this study was writing a computer program which extracts the results previously predicted by the function annotation algorithm, applying it on all human proteins (approx. 20000 proteins), as well as validating predicted BPO functions for MYBL2, a protein involved in cancer onset and progression. The program for extracting new functions of all human proteins, predicted by the function annotation algorithm previously developed in INS Vinča, was written in C programming language. It was optimized for memory and time efficiency. The program extracted true positive, false positive and false negative predictions, and also calculated statistical measures of performance, such as Precision, Recall and F measure. This program enabled examining and validating predicted function annotations. Functions predicted for protein MYBL2 were selected and in detail analysed through manual curation of available biomedical literature. The program extracted 16 predicted MYBL2 BPO functions. One prediction was shown to be redundant; 3 predictions were related to transcription, a previously known MYBL2 function; for 11 functions (92% of new functions) experimental confirmation was found in biomedical literature; only for 1 (8%) prediction evidence was not found. The program created in this research enables extraction of predicted protein function annotations. Its applicability was shown on the example of MYBL2.
Bioinformatics Analysis Of Mutations In The Asxl1 Protein, Important Biomarkers Of Myeloid Malignancies
Emilija Jovanovic, Branislava Gemovic
Additional Sex Combs Like 1 (ASXL1) gene encodes for an important epigenetic regulator and it is one of the most frequently mutated genes in myeloid malignancies. Mutations in this gene are associated with the disease aggressiveness and poor clinical outcome. Prediction of effects of amino acid substitutions (AAS) is done through application of bioinformatics tools, and the most used are PolyPhen-2 and SIFT. The aim of this work was to create more accurate alternative model, based on the Information Spectrum Method(ISM). COSMIC and dbSNP databases were used as sources of cancer-related mutations (13 variants) and neutral SNPs (63 variants). ISM-based algorithm was created, encompassing following steps: 1) Generating ISM spectra for ASXL1 sequences containing AAS, 2) For each frequency in the ASXL1 spectrum, differences between amplitudes of variant and wild type sequence were calculated, 3) Statistical significance in deciphering between mutations and neutral SNPs was assessed using Mann-Whitney Test. Results of ISM-based method were compared with PolyPhen-2 and SIFT using following statistical methods: Contingency Tables and ROC curve analysis. ISM spectra of all sequences in the dataset were constructed using ISM-based program ProteinSpectar. It was shown that a frequency 0.476 significantly distinguishes mutations from SNPs. ISM-based method outperformed PolyPhen-2 and SIFT tools. AUC value for ISM-based method was higher by 0.32 and 0.28 compared to PolyPhen-2 and SIFT, respectively. In this study, ISM more accurately predicted effects of AAS in ASXL1 than other most commonly used tools. Results suggest that there is a room for the improvement of standard tools for assessing functional effects of AAS in proteins, especially those involved in the pathogenesis of human diseases. Results point to the necessity for further research of ISM-based model and its application on variants in other disease-related genes.
Mapping of protein-protein interactions of estrogen receptors alpha and beta and progesterone in breast cancer
Tamara Drljača, Edward Petri, Nevena Veljković
Breast cancer is one of the leading causes of death in women and it has been shown that estrogen (ER) and progesterone receptors (PR) play a role in its development. Research related to the treatment of this disease is increasingly moving toward the development of targeted molecular therapies. If specific proteins involved in the regulation of the cell cycle in their disturbed state interact with ER and/or PR in breast cancer tissue, then such interactions are potentially significant for the development of targeted molecular therapy. The aim of this work was to propose candidate interactors for ER alpha and beta and PR involved in the regulation of the human cell cycle. Potential partners, 5337 distinct human proteins involved in the transcriptional regulation were identified as a gene list associated with the GO term: Regulation of cellular transcription, DNA dependent. Software tools that were used are TRI_tool, which uses physicochemical characteristics of protein sequence and iLoops, which uses structural characteristics of protein sequence. Coexpression of the gene encoding these proteins was analyzed, in Invasive Lobular Carcinoma and Invasive Ductal Carcinoma tissue and also in healthy breast tissue, this was done based on data from Expression Atlas resource. This analysis pipeline suggests tumour specific interactions and candidate interactors that could further elucidate ER alpha signalling in breast cancer.
Remote Monitoring of People’s Health and Activities based on Big Data Analytics
Aleksandra Stojnev Ilic, Dragan Stojanovic
Recent developments in sensor technologies, mobile and wearable computing, Internet of Things (IoT), and Big Data processing and analytics have given a rise to research and development in ubiquitous healthcare systems. Such systems provide continuous monitoring of patient’s physical, mental and health conditions by sensing and transmitting data representing heart rate, electrocardiogram (ECG), body temperature, respiratory rate, chest sounds, blood pressure, etc. Remote health monitoring systems are based on online/offline processing and analysis of Big Data gathered from smartphones, smart watches, smart bracelets (wristbands), as well various medical sensors and wearable devices. Such data need to be collected, fused, processed and analysed to provide diagnosis and treatment of patients with chronic diseases, as well as detection and prediction of critical medical conditions, such as cardiovascular problems, diabetes or depression. This paper provides an overview of the architecture for remote health monitoring and proposes method and system that implements given architecture. Such system supports efficient collection and analysis of massive quantities of heterogeneous and continuous health and activities data from a group, or a crowd of users. The storage, aggregation, processing and analysis of Big health data are performed on mobile devices, and within edge and cloud computing infrastructures. The results of data analytics and mining are provided to physicians, healthcare professionals, medical organisations, pharmaceutical companies, etc. through appropriate visual analytics and dashboard interfaces. A case study that addresses a timely detection of anomalies in mobile and medical sensors’ measurements and therefore critical health events and conditions are implemented to test the proposed method and architecture and to explore the benefits of current Big Data technologies. The system evaluation on real-world, publicly available personal health and activities datasets shows viability of proposed approach in successful prevention of common medical conditions, leading to better and personalized healthcare.
A pilot cognitive computing system to understand immunization programs
Marija Stanojević, Fang Zhou, Sarah Ball, William Campbell, Alison Thaung, Jason Brinkley, Stacie Greby, Alexandra Bhatti, Allison Fisher, Yoonjae Kang, Cynthia Knighton, Pamela Srivastava, Zoran Obradovic
US national, state, local, and territorial immunization programs use quantitative and qualitative data to ensure vaccinations are provided to prevent diseases. The results of qualitative data analysis are not always available to improve vaccination coverage because analysis is labor intensive. The Immunization Program Cognitive Computing System (IPCCS) was developed to analyze qualitative data for the Centers for Disease Control and Prevention (CDC). Text from a variety of formal and informal sources was collected to develop the IPCCS lexicon. Formal data included policy documents, vaccine-related websites, scientific journals, textbooks, and state vaccination-related laws. Informal data included Sysomos searches of Twitter, online forums news, and social media feeds from November 2016 to May 2018. Main challenges to IPCCS development included: 1) collection and matching spatio-temporal information of formal and informal data, 2) data cleaning and pre-processing to remove references to external documents, non-relevant data, jargon, typos, and misspellings, and 3) fast and well-performing word and paragraph searching. To address these challenges, data were iteratively and thoroughly cleaned and filtered, the best existing algorithms for text understanding were used, and a new algorithm for paragraph searching was developed. Customized features for the lexicon were developed to ensure that the results of the IPCCS are useful to vaccine domain researchers (e.g., ranking of US states to show representation extent of search phrase).
The Effect of Data Dimensionality on Determining the Appropriate Approach for Predicting Cancer Survival
Stefan Obradovic
A fundamental problem in machine learning, called the curse of dimensionality, is that accuracy decreases in high-dimensional applications when training data size is limited. An additional challenge in medical applications is that high dimensional data is difficult to interpret. To address these challenges this study aimed to determine the effect of feature selection on classification accuracy in datasets of different dimensionality. Data dimensionality was reduced by selecting half of the attributes as features using Forward or Backward Feature Selection or selecting the top two attributes by Mutual Information with survival, and compared to the baseline of using all attributes in a given dataset. For each of the resulting data tables, four classification models were considered: Random Forests, Support Vector Machines, K-Nearest Neighbor, and Logistic Regression. These were applied to binary cancer survival classification problems with 10 and 167 attributes, respectively. The prediction accuracy was measured using three-fold cross-validation and the area under the ROC curve. A statistically significant decrease in prediction accuracy was observed when reducing data dimensionality aggressively to two features, while accuracy was maintained when selecting half of the attributes for both low and high-dimensional data. It was hypothesized that a more representationally powerful model, like Random Forests, would be more accurate on high-dimensional datasets than linear alternatives. Conducted experiments supported this hypothesis and provided evidence that the Random Forest model outperformed alternatives for lower dimensional data as well (mean AUC=0.89). Using Forward or Backward Feature Selection along with Random Forest classification resulted in a robust solution that retained accuracy and was easier to interpret than the baseline. Further research is needed to characterize clinical benefits of improved interpretability at the observed model accuracy.
Characterizing Human Thymic T-Cell Repertoires with High-Throughput TCR-Sequencing
Aleksandar Obradovic, Mohsen Khosravi-Maharlooei, Aditya Misra, Howard R. Seay, Markus Holzl, Keshav Motwani, Susan DeWolf, Grace Nauman, Nichole Danzl, Siu-hong Ho, Robert Winchester, Yufeng Shen, Todd M. Brusko, Megan Sykes
T-cell receptors (TCRs) are generated from gene rearrangement with stochastic nucleotide insertion, forming complementarity-determining-region-3 (CDR3), which determines T-Cell Specificity. T-Cell repertoire is shaped by thymic selection on TCRs, but there have been no studies of human thymic repertoire development at the CDR3 level. Therefore, we sequenced TCRβ CDR3 in the thymi of seven immunodeficient mice receiving human hematopoietic stem cells and human thymus grafts: one triplet with identical bone marrow/thymus, and a set of two pairs with identical marrow and either matched or allogeneic thymus. T-cells were sorted into five subsets representing different stages of maturity, and 10,000-100,000 cells were sequenced for each subset. Diversity of repertoires was assessed by normalized entropy and log-log regression of unique CDR3 count against frequency by cell count. Divergence between repertoires by both shared-CDR3-fraction and Jensen-Shannon Divergence was assessed compared to baseline established by bootstrapped under-sampling of identical repertoires. Repertoires were highly diverse and divergent across-the-board, and there was no difference in divergence between mice with genetically identical bone marrow/thymus compared to mismatched mice. However, we observed increased repertoire sharing between more mature T-cell subsets and at amino acid compared to nucleotide level, indicating thymic selection. Repeated sub-sampling of all repertoires to 2,000 cells ruled out variations in sample size as the cause of observed differences. We further compared CDR3s shared across repertoires to unshared CDR3s, such that SVM trained on gapped k-mer counts in amino acid sequences recognized shared clones with >70% balanced accuracy by 10-fold cross-validation. CDR3s shared across repertoires were also enriched in known cross-reactive and self-reactive CDR3s, as well as in usage of a common 5’ motif and differential amino acids at specific functional positions compared to unshared CDR3s. The results presented here provide new insights into human T-cell repertoire development.
Fusion of Heterogeneous Data in Convolutional Networks for Real-Time Pollen Particle Identification
Predrag Matavulj, Sanja Brdar, Marko Panić, Branko Šikoparija
Real-time measurements of atmospheric pollen concentrations are of paramount importance for the improvement the quality of life in pollen sensitive population as wells as agriculture production. While current technology coupled with domain expertise provides only past pollen concentration data (1-7 days old), new laser-sensing device that collects single aerosol particles measurements promises to deliver real-time information, but imposes critical challenges for data science. Recorded scattered light and laser-induced fluorescence patterns, representing morphological and chemical fingerprints of airborne particles, demand for advanced machine learning algorithms for pollen identification and classification. Our initial study on data collected with the first device of this kind in Serbia, calibrated with ground truth pollen data collected in spring 2018 included 4 types of pollen (Betula, Picea, Juglans and Brussonetia) different in morphology and size, 2 of them representing strong allergens and one important in agriculture. Experiments with deep convolutional networks provide promising results in classifying pollen. In convolutional layers network learns features from scattering images (24 pixels x number of acquisitions) and fluorescence spectrum (32 x 8 acquisitions separated by 500 ns) and fuses discovered features in fully connected layers. With overall accuracy of 89%, deep learning approach provides promising results. Future challenges include development of unsupervised methods for data cleaning, large-scale real time data processing, adding more types of pollen and resolving multi-class issues in learning classifiers, adding regularization with knowledge from domain experts.
The application of autoencoders for dimension reduction on single-cell data
Aleksandar Armacki, Blaž Zupan
Single-cell RNA sequencing allows for analysis of gene expressions at cell level and can play an essential role in establishing new cell types and characterization of disease. Single-cell RNA data typically includes thousands of cells and the complete collection of genes, which is for mammalian cells in several ten thousands. Profiling of the cells with few latent factors resulting from dimensionality reduction may be the first preprocessing step in the analysis of such data. We have evaluated the dimension reduction capabilities of two neural networks based models – autoencoders and variationalautoencoders and benchmarked them against principal component analysis. We estimated the quality of resulting latent representations by assessing the quality of data reconstruction, cell type classification, and cell clustering. We compared the three approaches to dimensionality reduction through cross-validation on the same data set, or through developing an embedded space on the training set and testing it on the test set. Our results indicate that of the three approaches tested the best method for dimension reduction for single cell data is the autoencoder.

Track D - Digital Archeology

Photogrammetry as a necessity: the Temple of Isis in Stobi
Dimitar Nikoloski
Approaches to documentation using photogrammetry in cultural heritage has resulted in many advantages. The method of using photography to create scaled 3D models brings possibilities to archaeology, and the results that have been obtained with it so far are astounding. Archaeological sites are increasingly turning to this type of documentation, because of its simple approach and effectiveness. The resulting 3D models can be georeferenced, making it possible to measure, observe, and understand the relation between buildings, objects, all the while being located within the coordinate system of the site. The poster presentation would be based on the photogrammetric documentation of the newly discovered Temple of Isis in Stobi, Macedonia and the benefits from this approach. The photogrammetric data collected from the
Temple has been georeferenced to an accuracy of 3 mm and has been used for analysis as well as for the required technical documentation for the excavations. The opportunities that photogrammetry fosters make previous manual technical documentation techniques obsolete. Photogrammetry is more time-efficient and arguably more accurate that traditional manual documentation, and it significantly cuts the time needed for technical drawings, thus creating more time for actual excavation. The benefits in the shape of technical documentation for conservation projects will also be included.
Geometric morphometry of pottery shapes from Iron Age Northeast Taiwan
Liying Wang, Ben Marwick
Pottery are a commonly found artefact type from recent prehistory and often reflect ancient socioeconomic patterns. Craft specialization is one way to detect production organization based on the assumption that specialized mass production of pottery will lead to homogeneity of the product. Using the R programming language, we apply reproducible geometric morphometric methods to study pottery shapes from Kiwulan, a large multi-component archaeological site in NE Taiwan, to investigate if there are any changes resulting from foreign contact (European and Chinese) that might indicate social changes in the indigenous society. We find significant differences in shape and shape standardisation that indicate changes in pottery production resulting from foreign contact, suggesting increasing craft specialisation and changes in social organisation.
Epigraphy databases ‒ should Viminacium have one?
Ivana Kosanović
A vast effort has been undertaken in the past decades to digitize cultural heritage. Although many institutions are working on various projects in digitization, lots of work still remains to be done and there is always space for innovations and advancement. Epigraphy put digitazation into good use by making online databases. These databases change over time, and by constantly being improved, they made research in the field of epigraphy much easier. The use of EpiDoc (TEI markup for epigraphy and papyrology) made it possible for inscriptions to be easily uploaded into online databases. Since lots of inscriptions have been discovered in Viminacium so far and are still being discovered each year, their encoding in EpiDoc would make it possible to easily form a database. Through cooperation with other institutions, there would be possibility to form a coherent one for the inscriptions from the territory of Serbia, and to become a part of a wider network of databases for the inscriptions from the whole Roman Empire.
Modeling the Innovation and Extinction of Archaeological Ideas
Ben Marwick, Erik Gjesfjeld
The history of archaeology is often told as a sequence of biographies of prominent individuals and their publications. Because of its focus on big names and big papers, this approach is often not informed by what the majority of ordinary archaeologists are actually writing. Here we introduce a quantitative method of investigating a large number of journal articles to identify periods of innovation and extinction of ideas in archaeology. We use a Bayesian framework developed for estimating speciation, extinction, and preservation rates from incomplete fossil occurrence data. We model archaeological ideas with this framework by equating citations of archaeological literature with fossil occurrence data. We obtained reference lists for a large number of journal articles published during the last 30 years, and analysed the chronological distribution of cited items to identify periods of innovation and extinction in the archaeological literature. We discuss what our model tells us about the current state of archaeological ideas, and where and how our model's output diverges from traditional histories of archaeology.
Using Data Science to Determine the Probability of Finding an Archeological Site
Marko Milošević
Archeological sites are usually found accidentally, whether due to roadworks, foundation excavations, agriculture, etc. This can prove to be very costly for the contractors and fatal for the sites, which, in many cases, will be stripped of their findings and destroyed. By analyzing the similarities between archeological sites and their geographic position, archeologists are able to narrow their search and determine the probability of finding a site in a particular area. However, there are many preexisting geographic attributes that qualify archeological sites: is the site on a confluence of rivers, is it on a hill, is it on a plain, is it on a major road, etc. The aim of this study is to develop a classification model to determine the weight of importance that certain geographical attributes have on the probability that an archeological site is located in a given area. The obtained results provide evidence that the developed software can characterize if a position on a river would be more important than having a forest near the site, or if being in a hilltop location would be more determining of an archeological site rather than being near a small village. By analyzing multiple locations in an area, using the proposed machine learning based approach a probability map can be produced to rank locations to start an archeological excavation. By using this and more advanced machine learning methods, digital archeology could potentially be brought to a whole new level which was previously unimaginable.
Tracing The Evidence Of Prehistoric Copper Mining In Serbia
Dragana Antonovic, Selena Vitezovic, Vidan Dimic
Mining of metallic raw materials and the knowledge of the ore processing technology represent one of the key turning points in human history, which changed dramatically not only the economy of the prehistoric societies, but also worldviews in general. Relatively frequent finds of malachite lumps and beads, discovered at several Mesolithic and Early Neolithic sites across Serbia show that the prehistoric artisans were already familiar with the technical traits of these raw materials. Prehistoric metallurgy was invented in the Late Neolithic Vinča culture in Serbia, c. 5000 BC. The first exploitation of the carbonate copper ores (malachite, azurite) is confirmed on the prehistoric mine of Rudna Glava near Majdanpek. The production of copper and later bronze objects gradually increased through the Eneolithic and the Bronze Age. This increase in the production of objects must have been followed by increase in ore exploitation. However, it is still unknown which mines were active during the post-Vinča metal ages. In order to reconstruct the system of the copper ore acquiring, a detailed survey of all so far known copper sources on the territory of Serbia is needed. Beside ground-field survey, the studies should include the use of the LIDAR technology. This method already provided positive results in finding the traces of early mining activities in Bulgaria. The next step is creating the comprehensive database which will enable correlating diverse analyses of prehistoric copper and bronze objects from one side and samples obtained from the mines from the other. In fact, digital archaeology is the only one that can provide the possibility for successful reconstruction of the systems for acquiring ores, with special focus on the question on the evidence for the trade in raw materials and not just final products, which is already confirmed for many areas of prehistoric Europe.
Analyses Of Raw Material Choices In Prehistoric Craft Production
Selena Vitezovic, Dragana Antonovic, Vidan Dimic
Technological analyses of different archaeological objects must include diverse aspects: raw material selection, reconstruction of the technological procedure, use-wear traces, instances of repair and recycling, as well as circumstances in which an object is discarded and became a part of the archaeological record. One of the key questions in technological analyses and in analyses of specialisation and standardisation of craft production, is the question of the acquiring and managing raw materials. It is important for assessing the technological know-how, relations with the environment and other communities, trade and exchange, and many more. Comparisons of different raw materials obtained outside the given prehistoric settlement may point to the routes and patterns of trade and exchange. For example, in the Late Neolithic / Early Eneolithic Vinča culture is noticeable the exchange of different lithic materials between settlements, used for daily tools, as well as long-distance exchange of mollusc shells, used for luxurious, prestigious ornaments. On the other hand, careful analyses of locally obtained raw materials may point to specific relations with the environment (if some raw material was particularly frequent or completely avoided). The detailed analyses of the raw material choices in prehistory should include examination of artefacts with high-power microscopes, ZooMS analyses, petrological analyses, but also field surveys which will track down the queries, etc. For large-scale comparisons, a detailed database is needed, which will include information about the used raw materials in specific archaeological assemblages and about the available resources in a given region.
Digital ArchaeoZOOlogy in Viminacium: State of the art and future perspective
Sonja Vukovic-Bogdanovic
Archaeozoology, the discipline that involves the study of animal remains in archaeological record is intensively developing in Serbia and consequently in Viminacium. Viminacium, which used to be the legionary fortress and capital of the Roman provinces of Upper Moesia and Moesia Prima, has been excavated intensely for years, and animal remains are usually, along with pottery, among the most frequent finds. Analyses of the immense faunal assemblage from Viminacium raised a broad range of questions that are related to different aspects of human-animal relationships, such as the meat-diet, husbandry practices and ritual usages of animals in Viminacium, but also more general issues such as trade in Roman world. These analyses employ various digital techniques, from the initial collection of data in a digital database during primary analyses and the usage of digital reference collections, to more sophisticate techniques such as microscopic analyses of human-made modifications and analyses of CT scans of animal bones. This paper presents digital methods used so far in archaeozoological research in Viminacium with their implications on understanding important archaeological questions, but also suggests, discusses and clarifies the employment of more techniques in the future, such as the management of the abundant data and networking of Viminacium arhcaeozoological data with the data from the region and vast area of the Roman world.
Best Practice for Curating and Preserving Digital Archaeological Data
Rachel Fernandez, Leigh Anne Ellison, Francis P. McManamon, Adam Brin
Archaeologists generate large numbers of digital materials during the course of field, laboratory, and recording investigations. Products of these investigation including maps, photographs, data analysis, and reports are often produced digitally. Good curation of digital data means it can be discovered and accessed, and preserving these materials means they are accessible for future use. In many ways the managing, curating and preserving digital materials involves similar steps as those taken with the preservation of physical artifacts, samples, and paper records. However, digital materials require a different process which can appear daunting at first. In this poster we outline some simple steps for managing and curating digital materials that can be integrated into existing or future projects and that can be applied to digital materials from completed projects. We will also use real world examples from tDAR (the Digital Archaeological Record) to illustrate how people are preserving their digital materials for access and future use.
Advanced technology in archaeological documentation: case studies of Manasija Monastery and Kruševac Fortress
Jovana Šunjevarić, Milica Tomić, Aleksandar Stamenković
The development of technology and new methodological approaches in archaeological research has provided a significant contribution in terrain excavations, analysis of the obtained data, as well as in its further interpreting and presenting. The application of digital photogrammetry, unmanned aerial vehicles (drones) and geoinformation systems provides the precise technical documentation and the possibility for further advanced analysis. The poster focuses on the contribution of applying these methods on two archaeological sites. The collected data come from the medieval fortress in Kruševac and from the Monastery of Manasija, which present one of the most important Serbian medieval monuments. The attention is paid on the importance of georeferencing the obtained data and turning these archaeological and cultural heritage sites into the accurate and high-resolution 3D models, orthomosaics and digital elevation models (DEM). These results can be fully imported into GIS platform, which gives us the opportunity to create a precise technical documentation and a possibility for a work on advanced analysis. Regarding the fortress in Kruševac, and especially Monastery of Manasija, the new methods contributed with new information on the certain parts of the fortress/monastery walls and pointed out the state of their preservation. These results enabled architects and conservators to plan further work on restoration, reconstruction and protection of these cultural heritage remains. The obtained 3D content can be optimized for VR platforms for a fully immersive experience as well as for AR (augmented reality) and animated content for interactive presentation, exhibitions and museums. New methods and techniques of recording (digital photogrammetry, drones) became an important segment in archaeological research and in documenting cultural heritage in general, and for a reason we consider it important to present and explain the benefits of these new digital approaches.
Heritage in the Digital Age: Guidelines for Preserving and Sharing Heritage with Digital Techniques
Francis P. McManamon, Jodi Reeves Eyre
Individuals, organizations, and public agencies that are responsible for the stewardship of cultural heritage face challenges and opportunities. Challenges include: heritage loss due to poor access and preservation; lack of perceived value; hesitancy to share information resulting in absence of public interest; and loss of heritage information through destruction or neglect. Opportunities that exist using digital techniques include: broad and easy access to information (with appropriated controls); more options for public intepretation and outreach; and long-term preservation of data and informatoin. Current legal and policy issues related to digital technology and cultural heritage are considered. Our work with digital archaeological data is used to consider how to preserve more general cultural heritage data and information
Big Data Management In Archaeology – Practices And Experiences Of The Vinča Project
Nenad N. Tasić, Vitomir Jevremović, Saša Lukić, Kristina Penezić, Miroslav Marić, Dragana Filipović
One of the main goals of the renewed investigations at the large prehistoric site of Vinča near Belgrade, initiated back in 1998, was to optimize the methods of collection, storage and manipulation of the copious archaeological data. The effort at improving the then available techniques was driven not only by the necessity to make the process of field documentation faster and more efficient, but also by the idea to allow the immediate, i.e. on-site assessment of the collected data, in order to inform decisions on the excavation and sampling strategy. The first step towards enhancing the existing data storage methods was the development of MS Access databases for each of the many types of archaeological materials (e.g. pottery, chipped stone, ground stone, animal bone). These were designed in collaboration with the specialists, so as to ensure their suitability and the adequate level of detail. The next step was to ensure the ‘communication’ between the individual databases and create the link between them that would allow a comprehensive view of the data and, moreover, lay the groundwork for the subsequent analysis and data integration. In 2003, a new platform was developed – the ArchaeoPackPro! software package that uses SQL to integrate the previously used MS Access databases and enable them to interact, whilst at the same time allowing entry of fresh data. One of the many new opportunities that the package offered was the visualisation of the excavated layers and the find-locations of archaeological objects and materials, as well as the integration of their photographic record. The possibility to produce a 3D view of the excavated deposits and finds almost synchronously to the fieldwork was groundbreaking and offered numerous other options, such as the prompt presentation of the excavated areas and, more importantly, simultaneous analysis of the different datasets.
The Forming Of The Anthropological Collection At The Museum Of Srem In Sremska Mitrovica
Nataša Miladinović-Radmilović, Dragana Vulović
In 2016, at the Open-call for co-funding projects in the area of research, preservation and usage of the museum heritage of the Ministry of culture and information of the Republic of Serbia, we were given considerable funding to start the project Preparation of final documentation and provision of permanent and safe storage of osteological material from earlier anthropological research in Sirmium. We chose, as the starting point, the Museum of Srem and the material from Sremska Mitrovica (ancient Sirmium), not only because it is one of the most significant antique and medieval sites in our country, but also because anthropological examinations and analyses, accompanied by photos and excellent documentation, were performed, and also published, for over 1.000 individuals. In 2016 we managed to form an anthropological Antiquity collection, which is important not only for the research on the population of Srem in the Antiquity, but also for the entire territory of ancient Serbia and the territory of the former Roman Empire in Europe. The forming of this collection also enabled, along with safe storage, greater availability of these human remains for the needs of future research (isotopic and molecular-genetic analysis), digitalization of museum heritage and presentations. Although there is a trend towards the digitization of archaeological heritage, we must not ignore the ethical considerations regarding extraction of digital data for the display and research of human remains in both academic and museum environments. It is necessary to take care of the use of digital displays, legal ownership of creation of digital data, proper regulation and sharing of this data, etc. The plan is that, once the remaining medieval osteological material from the Museum of Srem is done, the attention be focused on the placing of the examined anthropological material from other important archaeological sites in our country as well.
People Of Lepenski Vir: Protocole For Digitalization Of Bioarchaeological Heritage
Jugoslav Pendic, Jelena Jovanovic, Sofija Stefanovic
In the past few years, archaeological scientific community was witnessing the tide of 3D scanning technologies being implemented, both in the field and in laboratory conditions, for production of massive datasets, focused on preserving and presenting archaeological heritage. This success in adoption and overall enthusiasm of the archaeologist with the process, could be easily explained: the equipment requirements for doing a quality 3D information capture, plummeted with appearance of novel approach to photogrammetry. The IBM (Image Based Modelling) on it basic levels required only a camera and some overcast sky or studio light, to have your site, your trench or a newly uncovered artifact, preserved as accurately scaled digital copy, for as long as the storage units would hold the data. The technology is now almost in the mainstream of the on-site documentation process, not so far from becoming mandatory, and 3D modelling is expanding far beyond typical focus on exceptional or visually striking objects and works of architecture. We are seeing strong cases for use of 3D scanning in routine artefacts research and publication – and this paper presents early products of one such effort. Supported by the Serbian Ministry of Culture and Information, team of archaeologist based in BioSense Institute, Novi Sad, works towards acquiring and providing open access to digitalized 3D models of an important anthropological collection, from Đerdap gorge, dated to Mesolithic and Neolithic period. With use of computer tomography and IBM, remains of individuals that were living during one of the most extraordinary periods of human history will be made accessible to a wide audience, retaining metric data and possibility to be analyzed online, while at the same time allowing for the real remains to stay out of exposure and potential harm done during handling.
Learning In The Third Dimension – Dealing With Classical Architecture
Vladan Zdravkovic
The use of multimedia software provides a powerful tool in recreating convincing renders of ancient physical structures and various monuments of the past. However, behind this tool stand knowledge and expertise of the entire research teams gathered around each such endeavour. Thus, the tool itself cannot reach the ultimate goal – a scientifically valid and technically and artistically acceptable representation of historical monuments. Several long-term projects realised over the past few years resulted in the gradually developed and most thoroughly elaborated 3D architectural studies of three important urban agglomerations of Late Antiquity. Two of them (Abu Mena near Alexandria and the Church of the Holy Sepulcher in Jerusalem) were studied within the large “Pilgrimage Project” (2012-2015), launched by the Roman-Germanic Museum in Mainz (RGZM) and Leibniz Association. Caričin Grad, a crucial archaeological site for understanding Late Antique architectural styles and urban planning, has been recreated in 3D in a large architectural study within the scope of five successive projects since 2002-2018, initiated by the Institute of Archaeology, Belgrade, and supported by RGZM and Leibniz Association. Comparing these experiences and final results implies the necessity of careful choosing of working methods for each object. The methodology applied is heavily dependant on general state of research of particular monuments, elaborated as either individual edifices or agglomerations of buildings, along with the state of preservation and accessibility of the sites and quantity of valid multidisciplinary data. These data repositories are subjected to scrutiny while working in 3D, which provides new and sometimes completely unexpected insights into buildings that, in some cases, can change our perception of reccovered architecture and the monument itself.
Vinča Digital Heritage For The Public
Kristina Penezić, Dragana Filipović, Jugoslav Pendić, Nenad Tasić
This year marks the 20th anniversary of the renewed excavations and research at the Neolithic site of Vinča. This long period witnessed implementation of new, digital technologies into the archaeological process. The possibilities of the digital documentation were explored through projects of digitalization of field documentation from the previous excavations at Vinča. The success with these initiatives inspired the wide usage of digital technologies in the documentation of the site and the collected material, and in the 10.presentation of the cultural heritage. The already developed communication with the public via ‘traditional’ routes (e.g. exhibitions) was intensified through various digital means. For instance, virtual reconstructions became an integral part of the presentation of the archaeological site Vinča, and not only the reconstruction of individual Neolithic buildings, but of the entire settlement and its immediate surrounding. In addition to virtual reconstructions, photogrammetry has extensively been used in recent years, both for recording of features in the field and for more accurate documentation and presentation of the artefacts. In this manner, 3D models of findings not accessible to wider audiences could be presented in a virtual environment to both the scientific community and the general public. The extensive use of digital technologies in archaeology, pioneered by the Vinča Project and the Centre for Digital Archaeology in Belgrade, has in the meantime been introduced and successfully implemented at a number of other sites in Serbia. This digital development marks a new era in the history of archaeological research in this country.
The Application of Sensing and Detection Methods and the Interpretation of Digital Data: The Case of Caričin Grad - Justiniana Prima
Vujadin Ivanišević, Ivan Bugarski, Aleksandar Stamenković, Sonja Jovanović, Nemanja Marković
Thanks to the application of modern non-destructive sensing and detection methods a series of new data on urban planning in Caričin Grad - Justiniana Prima - were obtained. For the most part, the current research project studies the Upper Town’s northern plateau, wooded until recently and hence the only previously unexplored unit of the city. The classical research method – the excavations started in 2009 – is for the first time combined with the systematic application of airborne and terrestrial sensing and detection techniques. The analysis of historic aerial photographs and topographic plans proved to be very useful as well. Along with them, LiDAR-derived DTMs, photogrammetric DEMs, different geophysical and orthophotographic plans are stored in the GIS database for Caričin Grad and the Leskovac Basin. In this way almost 80 percent of the plateau area was defined, and the obtained plan is hypothetical only to a small extent. Each source provided relevant information for the reconstruction of both the rampart and the settlement, which points to the value of a holistic approach to documentation from various dates. The parallel application of classical research methods and modern techniques of sensing and detection enabled the reconstruction of the northern rampart and the urban matrix of the Upper Town’s northern plateau. The rampart route, the disposition and the form of the towers, and the possible locations of the posterns were defined as well as a settlement with its radially distributed rows of buildings cascading down the slope. Until recently among the least known parts of the town, this unit can now be regarded as one of the best defined. This is important not only for our understanding of Caričin Grad, but also for the study of Early Byzantine urban planning in general.
Digital Reconstruction of the Late Antique Helmet from Jarak
Miroslav Vujović, Stevan Đuričić
In 2005 a hoard near Jarak, in Srem district was found, consisting of a small ceramic vessel with fragments of gilded silver plaque and a great number of silver wedges with spherical and calotte heads. The detailed study of this exceptional find included style, typological, chronological and physico-chemical analyses and all provided interesting results. It was determined that the gilded silver plaque with stamped ornaments was actually a part of the high-priced late antique helmet of Berkasovo type, similar to ones discovered in the 1960s near the village of the same name, not far from the town of Šid. The most probable time-frame for the production of this helmet and its burial near Jarak would be the second decade of the 4thcentury, at the time of the conflicts between Constantine the Great and Licinius. As a part of the scientific project for reconstruction and publication of the hoard from Jarak, a digital 3D model of the helmet was produced, allowing additional studying of its appearance and construction, but also providing a possibility of modern presentation by using the visual media in museum exhibitions and scientific-popular film achievements.
Photogrammetric Processing Of Grave Units At The Archaeological Site Of Viminacium
Željko Jovanović, Milan Savić
The use of digital technologies in the research on the site of Viminacium has significantly simplified and accelerated the work on archaeological documentation. One innovative and successful example of digitalization is certainly the process of photogrammetry, which replaced the traditional way of, often slow and insufficiently precise, technical drawings on the field. By combining photogrammetric image linking with geographically positioned points, it is now possible to create precise 3D models of explored archaeological units (cultural layers, field objects, graves, etc.). The aim of this poster is to display the results of photogrammetry processes applied on various grave types found at the site of Viminacium. The production of 3D models and drawings of grave units, as well as the precise distribution of small finds found within them, represent an important base for further archaeological analyses. The application of 3D models in archaeological research enables more successful analyses, visualizations and presentations. Also, by using the AutoCAD software, researchers can now create more precise technical documentation in the desired scale, including the base, cross-sections and details, in the example of described grave units.
Drenovac Digital Data – Investigating a Neolithic Settlement
Slaviša Perić, Đurđa Obradović, Ivana Stojanović, Olga Bajčev, Ružica Arsenijević
Drenovac is a large, multi-layered site that was occupied during the Early Neolithic Starčevo culture (6100-5900 BC) and the Late Neolithic Vinča culture (5300-4700/4500 BC). It is a key site for the research project conducted by the Institute of Archaeology in Belgrade focused on the Neolithic settlements in the Middle Morava Valley and their place in the wider context of Neolithisation of Southeast Europe. Ongoing systematic excavations at Drenovac made it possible to investigate life history of the settlement, its spatial organisation, demography, social relations, household organisation, economy, technology, diet and environment. This kind of research involves numerous experts from different fields and a large quantity of data. Thus, it demands a new, modern approach, which enables an efficient and integrated processing of the data. During 15 years of continuous research, the project has engaged in improving the excavation methodology and field documentation through the implementation and development of a digital recording system. The benefits of the use of modern technology are seen in: (1) acquiring new data about the settlement size and organisation, obtained through geomagnetic survey; (2) better integration of field excavation data and the results of the post-excavation analysis through the introduction of digital databases; (3) efficient and diverse spatial analysis enabled by detailed and precise mapping of finds and features with the total station; and (4) producing detailed plans and 3D models of archaeological features and findings using photogrammetry and at the same time offering the wider public a distinctive glimpse into the past through a unique and tangible experience.
Mathematics as a Tool, a Language and an Input for Digital Technologies
Emilija Nikolić
Research in Geometry and Reconstructions in Archaeology Geometric features of historic buildings and artworks represent inputs for reconstructions within the framework of archaeology. When speaking of Roman architecture, the theorists often conclude that geometry and arithmetic are intertwined, while the structure and the idea about a building cannot be separated one from another. Roman amphitheatres are historic buildings that have been most frequently analysed using the help of the previously mentioned branches of mathematics. After the geometric analysis of the amphitheatre at Viminacium, the former Roman city and legionary fortress by the Danube in present-day Serbia, and the information offered by the ancient scale rulers excavated in Viminacium, its partial reconstruction was built. Many mathematicians have tried to mutually connect theories of symmetry and ornaments. The conclusions can be applied in the research of Viminacium funerary painting. The isosceles trapezial cross-section, a feature of most of the painted graves, is suitable for the symmetrical arrangement of motifs, so the mentioned theories can help in the reconstruction of the lost decoration. The projection systems used in ancient painting are mutually different, but also different from the “scientific perspective” developed in the Renaissance. Fragmented wall painting from Sirmium, the only one with the architectural scene found so far in Serbia, can be reconstructed using this knowledge. This can be also applied to the Roman tomb in Brestovik, where the motif of beams is depicted, the only one found in Serbia. While analysing a historic building or a painting, one can always look for a rule or a principle. The answers are numerous, because the executed state differs from the imagined one. Adjustments were common, as well as coincidences. However, conclusions made using mathematics are always the producers, generators or inspirers of further scientific research, even in the most contemporary disciplines, such as digital archaeology.
Digitization Of Cultural Heritage In The Area Of Medieval Ras
Uglješa Vojvodić, Vesna Bikic, Vladan Vidosavljevic
In the eighties of the twentieth century, the Institute of Archaeology renewed The Archeological Sites of Southwestern Serbia project. In the period from 1981 to 1986 intensive reconnaissance of the area around Novi Pazar, Tutin and Sjenica began. These activities followed the creation of documentation on cultural monuments, in accordance with the methodology of the time. After the additional works carried out between 2008 to 2010, the results are compiled in the 2014 publication Archaeological chart of Novi Pazar, Tutin and Sjenica. The increased devastation of the sites, as well as the emergence of new methods of digitization, point to the necessity and urgency of complete digital documentation.
The Virtual Museum Ras Project is a continuation of the long-lasting fruitful cooperation between the experts of the Institute of Archaeology and the Museum in the area of the medieval Ras. The Project plan envisages the creation of digital documentation on immovable cultural monuments from the area and the material stored in the museum showcases and depots. They will be recorded with photogrammetry, which provides the opportunity to obtain as valid and quality documentation as possible. Also, the documentation will be georeferenced and would store precise data on the dimensions and the relationship between the documented units. In addition to being used for scientific purposes, it would be possible to present movable and immovable monumental heritage outside the original location, making it more accessible to the general public and to people with disabilities. The ultimate goal of the Project is to create a digital library which will be used for long-term storage of digital collections of movable and immovable cultural monuments under the authority of The Ras Museum. Access to digital objects will be enabled through the newly-formed section of the Virtual Museum 'Ras' within the already existing website
Trace Me If You Can -Experimental Researches Of Polished Stone Axes, Adzes And Chisels And Comparative Traseological Analyses
Vidan Dimic
The function of the Neolithic ground stone edge-cutting tools was throughout the 20th century determined only by the form, and this resulted in copying the typological determination on the tool’s possible function. Moreover, by the 1980s, the analysis of these tools was mainly typological and limited to the cataloguing, and all the tools with the cutting edge were interpreted as different types of axes (battle axes, wedge-axes, tongue and mould axes, miniature axes, etc.) while the adzes and chisels were not recognized in the archaeological material. Such a practice, based on subjective observations of the form without detailed analysis, led to errors in the general interpretation. This situation changed considerably a decade later, with the work by D. Antonović and by a radical change in the methodology of exploring this category of stone tools. A different focus of research and incorporation of functional-typological and petrological analyses led to the creation of a significantly larger data pool. Also new research questions emerged. One of these, not completely solved problems at the global level, is the use-wear on these tools, that is, traceological markers that clearly separate axes, adzes and chisels, as well as markers that are formed on these tools individually in performing various tasks including different factors. There are still no sufficient publications regarding the occurrence and development of these specific damages. Therefore, author’s doctoral dissertation will focus on the exploration of the traces of the production and use of the mentioned tools through the archaeological experiment. The aim of the research is to form a comprehensive reference database of traceological markers through experimental archaeology as a supporting research method, through which, in the future, a comparison and functional determination of original polished stone tools with a cutting edge would be made.
Beyond Archiving: Synthesizing Data with tDAR
Adam Brin, Leigh Anne Ellison, Francis P. McManamon
Archaeological projects generate abundant data that is often underutilized in research and analyses beyond the life of the project. Although some projects curate their data, they often do not make those data widely available, accessible, or easy to aggregate at different granularities for additional research. Discipline specific digital repositories and data publishing platforms (e.g. tDAR, ADS, Open Context) are beginning to address problems related to the access and the utility of legacy databases and data sets. Now, tDAR has a tool to aid in synthesizing data collected without a priori standardization, meaning researchers can easily bring together large data sets from within and across sites and regions for new and exciting analyses. This poster presentation describes the tool and how to use it for synthetic research with case studies from the American Southwest.