The CodeKoan Search Engine
A Source Code Pattern Search Engine
Codekoan is a special search engine, which lets users submit large source code documents as queries. In these source code documents Codekoan recognizes approximate reuses of short code fragments from a large index. Codekoan’s algorithm is mostly programming language independent but it has some small programming language specific parts.
Smart Source Code Search
A lot of publicly available code search tools use some form of string based search. CodeKoan uses a more complex token based algorithm that searches for source code based on the “micro-structure” of source code. The advantage of the algorithm is, that it is capable of recognizing short code patterns over a variety of different application domains.
mnist-idx is a haskell library for reading and writing the IDX format that stores vectors or matrices for use in machine learning algorithms. The most widely known data set in this format is the MNIST database of handwritten digits.
Schramm, C., Wang, Y., & Bry, F. (2018, May). CodeKoan: A Source Code Pattern Search Engine Extracting Crowd Knowledge. In 2018 IEEE/ACM 5th International Workshop on Crowd Sourcing in Software Engineering (CSI-SE) (pp. 1-8). IEEE.
Schramm, C. (2017), Recognition of Code Patterns From Stackoverflow Answers in Computer Programs. (Master thesis completed at LMU Munich)