James distributed event system
ㅁ James 이중화 시 발견되는 버그
- 메일 수신 시 EXIST, Expunge Response 가 되지 않음.
ㅁ 문제
- 이중화 환경에서 메일 수신 / 삭제 등의 이벤트가 발생하는 경우
이벤트가 발생한 서버외의 분산 서버들은 이벤트를 클라이언트에게 전달하지 못한다.
ㅁ RFC 2177-IMAP4 IDLE command
1. James #1 에 메일 수신 시 James #2 는 알 수 없음. SMTP -> LocalDelivery -> IMAP APPEND 2. James Event system. 3. To-be |
ㅁ Hazelcast manager log.
1) James #1 Started , James #2 Start up.
2) James #2 started.
3) James #2 stop. |
ㅁ Hazelcast configuration.
1. hazelcast
https://github.com/normanmaurer/mailbox-distributed
This project contains helper classes which make it easy to support clustering for
Apache James Mailbox (http://james.apache.org/mailbox) implementations.
This is done by using hazelcast for all the distribution operations.
By using this classes it should be really straight forward to build an Apache James
Server cluster. (http://james.apache.org/server)
2. James Distributed event system
https://research.linagora.com/blog/?p=226
Most of todays applications deals with events. It can be notifications on any social networks, chat application, or any other things that happen in one place but people in other places must also know it. I faced such a problem during my work on Apache James.
What is IMAP IDLE
IMAP IDLE is an extension to the IMAP protocol. It allows you to register yourself to one of your folder ( Mailbox ) and get notified of every things happening into it. Most of the time, you register your INBOX folder. And if you want to register to multiples folder, you need to have one commection per registration. Ok, this is technical details…
Any way… The problem is the following :
-
Client A register his Mailbox on server 1.
-
E-mail arrives in his Mailbox on server 2.
How server 1 ( on which client 1 is connected ) can get the information of this incoming mail ?
Well I will of course answer this question. But first we will see how event are managed in James.
How does the event system was working in James ?
Well, first you have an interface ( MailboxListener ) for every thing that will do some work with events. This is just a method with the event passed as a parameter.
Then you have the event producer. Each time you get something interesting happening on your mailbox, the MailboxEventDispatcher generates an event…
… and pass it to the MailboxDelegatingListener whose role is to send the event to the registered Mailbox Listeners.
Ok… Not that hard. But …
You can register in two ways with the MailboxDelegatingListener :
-
You can do it globally. It means the listener will be triggered for each and every event. This is the behaviour you want when you use these events o index mails, send data to other applications, and so on.
-
You can do it for a specified mailbox. This is what happens with IMAP IDLE as you may already guessed.
Ok, I think we won a little diagram :
Well, this works with one James server but what happens with multiple James server ?
Of course, it won’t work out of the box as we need to pass events from one server to an other.
We must of course take care to trigger global listener only once ( we assume global listener deployed are the same on each James servers… ).
Event system using messaging …
Now interresting things begins.
First of all, we can avoid the global listener problem by deciding to trigger them only on the server that generated the event.
Well now we need to send events from a server to the others. We will of course need to serialize our events. I am using Message Pack to do this. I uses Jackson annotations on proxied objects. Jackson is just awesome, and proxy objects aims at isolating it from the rest of the code.
Now we need to send these events. What better than a message queue for this task ? You might not know it but I love Kafka lightweight design so of course I will choose Kafka.
Ok, great ! But we also need to listen to the message queue, no ? And deserialize events ? That is what the Kafka consummer aims at…
Well, but on the server generating the event, you might handle the message two times ?
- Well spot… That’s why clients should be notified only for messages pulled out of the queue.
I think it is time for an other diagram :
Note that in this implementation proposal, I am using only one queue. This means that each and every James server will receive, and deserialze each and every events. Generating quitte a big amount of load. And limiting scalability. That’s why I did not stop my work their and try to find a smarter solution.
… and registration !
Well. So we need to get things finner…
Why not sending the event only to the James server that need it ?
But how to I find the James server that need it ? My james server have no idea for this… Wait… But my James that want the event knows it. Maybe it can register itself somewhere. And each and every james server, when having an event will chek using the registration service to know wich James server wants it, and so to which queue we need to send this event.
Wonderful, and we have a tool to do this, ZooKeeper. It can handle client disconnection ( meaning all registration made by a killed client can be automatically removed ).
ZooKeeper behaves as a tree. Data are grouped on ZNodes, to wich you can add childs holding the information.
Here is how registration process works :
- We have an event on mailbox #private:bidon@toto.fr:INBOX
- We ensure that the ZNode corresponding to this mailbox path exists. This must be a persistant addition as other servvers might register to this ZNode.
- We then add a child holding the name of our dedicated message queue. The name is randomly generated using UUID ( chance of collision is minimal, and even if it arrives, we will just have two James server having events duplicated. This is definitely not the end of the world, as events get filtered per mailbox before being notified to listeners ). Or configuration specified.
So when server 2 have to handle an event on #private:bidon@toto.fr:INBOX
- It reads the ZNode corresonding to #private:bidon@toto.fr:INBOX and get the list of its children.
- It sends the serialized event to each and every queue notified above.
Isn’t that nice ?
Diagram ?
Well what are the pros and cons of such a system ?
- It is easily scalable : Kafka can handle up to 1500 topics ( thanks Twitter ) meaning you can have up to 1500 James servvers and it will still works. On higher number of James servers, sharding queues can be a solution ( we would accept to deserialize events for others servers to limit the number of queue ).
- Performance will be good on ZooKeeper. The ratio of read/write will be high ( events are more frequents that registrations ), and we can hope handle up to 50.000 event per second on a single ZooKeeper quorum. If we are limited by ZooKeeper performances we just have to shard it using the Mailbox Path as a key.
-
Event serializing take ~1ms and is handled in a threaded environment so I guess it is OK.
-
We add some overhead on :
- registration to the dispatcher ( we need to register in ZooKeeper )
- Event management ( read ZooKeeper to know who to send it to )
Final notes
The tools and technics we saw can be used with many other situation ( that is why I make this post ). Every time you have events that must be sent to some particular places, this works. And it was also a practical example of how to use ZooKeeper.
Between the two solution I proposed, note that solution 1 ( using a single queue ) can be a restriction of solution 2 ( using a stupid MailboxPathRegister that gives only the same message queue as a target ). Solution 1 is better for small James clusters, and will limit your infrastructure ( you can avoid having a ZooKeeper quorum… Of course if you are not using Kafka… ). The overhead on event handling is also smaller as you will not need to check other middleware. But you may loose some time serializing and deserializing events when it is not needed. By the way, as soon as the number event is big, solution 2 will be needed.
Finnaly, I want to answer one question :
- Why didn’t you put an IMAP load balancer on top of it so that each and every operation concerning a mailbox will ever arrive on the same server ?
I don’t like it as :
- It adds a layer ( the same way we did it ) but this layeris a new one ( Kafka and ZooKeeper may be used in other part of James ).
- You have to hadle SMTP by the way …
- You will have to manage IMAP rights. If I have the right I can listen someone else Mailbox.
- How do you manage SMTP ? And do you take into account possible mailbox renaming ( eg : btellier@bidon.com => benoit.tellier@bidon.com )
- Shared mailboxes might be tricy.