Themenbereiche

Hier finden Sie einen Überblick über die aktuell zu vergebenden Themenbereiche für Bachelor- und Masterarbeiten.

❎ Wir betreuen am Lehrstuhl für Wirtschaftsinformatik bis auf Weiteres keine externen Arbeiten mit Praxispartnern.

Bitte beachten Sie, dass wir aufgrund der hohen Nachfrage nur Themenvorschläge akzeptieren, die mit den unten ausgeschriebenen Themenbereichen inhaltlich eng verwandt sind.

Bachelorarbeiten

Comparison of k-means and DB-Scan Implementation on a dataset

In the Business Analytics Bachelor course you have learned about the k-means clustering algorithm. Explore how another clustering algorithm named DB-Scan works and apply both of them on a dataset. Compare the procedures and the performance of the two different clustering algorithms, emphasizing the strengths and weaknesses of both of them and in which case you would prefer which algorithm. Use the following datasets for your application: iris dataset (e.g. csv file from https://datahub.io/machine-learning/iris#r), gvhd or banknote dataset (both in mclust R package) and one dataset of your own choice.

Supervisor: Veronika Wachslander

Literature:

MacQueen J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Band 1. University of California Press, 281–297.

Ester M., Kriegel H., Sander J., Xu X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD-96 Proceedings, 226-231.

Evaluation of FLUSS segmentation algorithm

The Matrix Profile algorithm ‘FLUSS’ is one of the most powerful time series segmentation algorithms and can be used to find anomalies and structural breaks in a wide variety of time series data. Unlike statistical approaches, it follows a pragmatic shape-based approach which can be understood intuitively. The goal of this thesis is to implement and evaluate the FLUSS algorithm, emphasizing its strengths and weaknesses. For the implementation of the FLUSS algorithm, use the R package ‘tsmp’. Use the following time series repository for your application: The UCR Time Series Classification Archive.

Supervisor: André Konersmann

Literature:

Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., ... & Keogh, E. (2016, December). Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). IEEE.

Gharghabi, S., Ding, Y., Yeh, C. C. M., Kamgar, K., Ulanova, L., & Keogh, E. (2017). Matrix profile VIII: domain agnostic online semantic segmentation at superhuman performance levels. In 2017 IEEE international conference on data mining (ICDM) (pp. 117-126). IEEE.

Boosting Methods: Theory, Operation and Application

In addition to the use of a single forecast model, various ensemble methods exist. One of these ensemble methods is boosting, in which many weak models that build on each other are combined to form an overall model. In the meantime, different forms of Boosting exist. The goal of this thesis is to implement different Boosting algorithms and compare them by applying them on a new dataset.

Supervisor: Veronika Wachslander

Literature:

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785

Schapire, R. E. (2013). Explaining adaboost. In Empirical inference (pp. 37–52). Springer.

Synthetic Oversampling with Neural Networks for Uplift Modeling

Uplift modeling is a technique that models the incremental impact of a treatment on an individual’s behaviour. It is primarily used in customer relationship management (CRM) and for up- and cross-selling measures. Many uplift modelling tasks, however, suffer from imbalanced datasets, meaning there are several classes from which one or more are severely underrepresented, which is very disadvantageous. To overcome this problem, approaches based on deep neural networks such as Generative Adversarial Networks (GAN) can be used to oversample existing data to balance the dataset and in turn improve prediction accuracy. The aim of the thesis is to implement a GAN to perform oversampling of the data set and to compare the performance of balanced and unbalanced uplift models.

Supervisor: André Konersmann

Literature:

Michel, R., Schnakenburg, I., and Martens, T.v. Targeting Uplift: An Introduction to Net Scores. 1st ed. 2019., Springer International Publishing, 2019, doi.org/10.1007/978-3-030-22625-1.

Radcliffe, N. Hillstrom’s Mine That Data Email Analytics Challenge: An Approach Using Uplift Modelling. Stochastic Solutions Limited White Paper, 2008. stochasticsolutions.com/pdf/HillstromChallenge.pdf

Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16. 2002. arxiv.org/abs/1106.1813

Chereddy, N.V., Bolla, B.K. Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings. In: Morusupalli, R., Dandibhotla, T.S., Atluri, V.V., Windridge, D., Lingras, P., Komati, V.R. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2023. Lecture Notes in Computer Science(), vol 14078. Springer, Cham. 2023. https://doi.org/10.1007/978-3-031-36402-

Fajardo, V.A., Fajardo, Findlay, D., Houmanfar, R., Jaiswal, C. Liang, J., and Xie, H. Vos: a method for variational oversampling of imbalanced data. arXiv preprint 2018. arxiv.org/abs/1809.02596

Masterarbeiten

Implementation of the SMOTE algorithm

In classification a specific problem occurs quite often: There are several classes but one or more classes are significantly underrepresented in the dataset, meaning you have an imbalanced dataset. In this case, it is a lot more difficult to receive a proper result and it can highly influence the result of your prediction in a negative way. An approach to solve this problem is the SMOTE (Synthetic Minority Oversampling Technique) algorithms. The goal of this thesis is to implement and evaluate SMOTE and different SMOTE extensions.

Supervisor: Veronika Wachslander

Literature:

Einstiegsliteratur zu SMOTE: https://arxiv.org/pdf/1106.1813.pdf

Link zu DBSMOTE: https://link.springer.com/article/10.1007/s10489-011-0287-y

Hybrid Recommender Systems

Recommender systems are fundamental for E-Commerce and information access since they help to overcome information overload and encourage users to select items that best suit their needs and preferences. To generate recommendations, recommender systems use various techniques ranging from simple frequency-based procedures to collaborative filters that make use of statistical twins in the database. Since each technique uses different information to generate recommendations, a combination of the methods seems to be reasonable. The goal of this thesis is to implement different recommender techniques, apply them to a data set and learn appropriate weightings based on the results of the respective model in order to combine them.

Supervisor: Veronika Wachslander

Literature:

Bates, J. M., and Granger, C. W. (1969). The Combination of Forecasts. Journal of the Operational Research Society, 20(4), 451–468.

Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer Publishing Company, Incorporated, 1st Edition.

Evaluation of Interpretable Machine Learning methods

In many cases, complex machine learning models such as boosted trees, deep neural networks or ensemble methods achieve better results than simpler models. However, these procedures have a decisive disadvantage: they are so complex that people can no longer understand how they arrive at their predictions. They are black boxes. To address this problem, a growing body of AI research is concerned with how to make complex models interpretable and understandable ex post. The goal of this project is the implementation and comparison of (at least) three state-of-the-art interpretation methods with regard to a self-created, complex machine learning model: Shapley Values, LIME and Counterfactual Explanations. You will use an openly available dataset of your choice.

Supervisor: André Konersmann

Literature:

Molnar, C. (2020). Interpretable machine learning. Lulu. com.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).

Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31, 841.

Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41, 647-665.

IoT-based watering and plant control system

The goal is to further develop a watering and plant control system based on a Python- & web-based dashboard. The main tasks are the monitoring of different watering strategies, the integration of optical sensors and the integration of a tank-weight-sensitive watering system. Additionally, different energy-aware watering tactics should be developed and investigated.

Supervisor: Veronika Wachslander