Kafka aggregate single log event lines to a combined log event -
i'm using kafka process log events. have basic knowledge of kafka connect , kafka streams simple connectors , stream transformations.
now have log file following structure:
timestamp event_id event
a log event has multiple log lines connected event_id (for example mail log)
example:
1234 1 start 1235 1 info1 1236 1 info2 1237 1 end
and in general there multiple events:
example:
1234 1 start 1234 2 start 1235 1 info1 1236 1 info2 1236 2 info3 1237 1 end 1237 2 end
the time window (between start , end) 5 minutes.
as result want topic like
event_id combined_log
example:
1 start,info1,info2,end 2 start,info2,end
what right tools achieve this? tried solve kafka streams can figure out how..
in use case reconstructing sessions or transactions based on message payloads. @ moment there no built-in, ready-to-use support such functionality. however, can use processor api part of kafka's streams api implement functionality yourself. can write custom processors use state store track when, given key, session/transaction started, added to, , ended.
some users in mailing lists have been doing iirc, though not aware of existing code example point to.
what need watch out handle out-of-order data. in example above listed input data in proper order:
1234 1 start 1234 2 start 1235 1 info1 1236 1 info2 1236 2 info3 1237 1 end 1237 2 end
in practice though, messages/records may arrive out-of-order, (i show messages key 1
simplify example):
1234 1 start 1237 1 end 1236 1 info2 1235 1 info1
even if happens, understand in use case still want interpret data as: start -> info1 -> info2 -> end
rather start -> end
(ignoring/dropping info1
, info2
= data loss) or start -> end -> info2 -> info1
(incorrect order, violating semantic constraints).
Comments
Post a Comment