About word clouds and search algorithms

We are serious when we say: Our customers actively co-determine the BSI software roadmap. The classification of e-mails was a frequently discussed topic at the Retail User Group Meeting 2018. E-mails today are often routed to the teams in charge on the basis of rules. Many decisions must then be made by the agent however: Which process, which text? There’s an easier way to do it. BSI has taken on the challenge in close cooperation with Walbusch.

The algorithm sorts words and e-mails in the same context into word clouds and searches for the boundaries between them.

The vision

Using machine learning for error-free allocation of e-mails, automatically composing an appropriate answer with text blocks, and then sending it to the customer once it has been reviewed – thereby freeing up time for what is truly important: authentic customer interactions.

The method

Walbusch made around 700,000 anonymized customer e-mails available to us. These were allocated to 20 teams, partly rule-based, partly manually. We used this allocation as input for the generation of paragraph vectors (“Doc2Vec”): To make e-mail texts technically processable, a vector was generated from each word. Words in the same context were sorted into word clouds. At the same time, we also allocated the e-mails to the same word cloud. E-mails with similar content were thereby close to one another in this word cloud. For example, e-mails about address changes are close together, but hopefully overlap as little as possible with the positions for order changes. The algorithm then seeks boundaries between these categories for the correct e-mail allocation. Applicable in the process: The more dimensions that are used for the word clouds, the better the delimitation of the categories.

“We trained the algorithm with 700,000 e-mails and 100 runs over the course of 2 days.”

Christoph BräunlichBSI

We worked with 150 dimensions while training the search algorithm (difficult for us humans to follow, as we are only familiar with 3-D…) and sorted the words and e-mails among the word clouds in more than 100 passes. To allocate an e-mail to a team, the algo-rithm determined the position in the word clouds. If the e-mail is within the boundaries of a category, it counts as a hit. In this way, we managed to correctly allocate e-mails to the right team in around half the cases. The reason for the 50/50 result: Issues addressed in e-mails often involve multiple teams, making a clear allocation not possible in some cases. In the next step, we therefore train process allocations rather than team allocations, because these are more meaningful at Walbusch.


It is a work in progress. The first proof of concept formed a good basis for further training runs. Once a word vector network is trained, it can be used in a variety of ways. The knowledge gained will also flow into the development of neural networks for BSI Studio. To be continued. That’s a promise. To be continued. That’s a promise.