This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern for current CERN information.
Trigger | Options | Transmission | Concentration | Issues | ||
BPM | Orbit | Publish 10 Hz | bunches/no of turn all enabled | CMW or UDP or multicast or both | Ralph. 2 channels, spare | |
Post Mortem | PM event | PM API | PM | |||
Multi-turn - capture mode | Warning plus BST | bunches |
CMW - subscribe | Internal clock
|
blocking, schedule requests, chunking, reassembly, trajectory (on-line)versus megaturns (offline)? request to master - concentrator which arbitrates - no clashing request Let lynx do it? Check. Test it. Problem could be the ethenet driver Chunk filter ID - reassembly by concentrator. Could optimise - filter set by client. |
|
BLM | 1 second averages | Publish 1 Hz | CMW - pub | LSA | ||
snapshot | timing event | ? | selective pull ? | blocking, chunking, reassembly schedule requests | ||
Post mortem | PM event | Going to have to parse this down | PM API | PM | ||
real time | 50 - 100 Hz | reflective memory (?) private BLMs? Split data. |
Local |
Concentrated capture data from 4 crates - October 2006 c/o Sandeep image
See also triggered acquistions and orbit - high level requirements
This is how I propose to deal with the concentration for the BPMLHC case:
1) BeamThreading (B1+B2): Event= HIX.AMC-CT (572), Max. Delay = 1sec
2) GetCaptureData (B1): Event= HX.BPMCAP1-CT (584), Max. delay = (89usec*nbOfCaptureTurns + 1sec)
3) GetCaptureData (B2): Event= HX.BPMCAP2-CT (585), Max. delay = (89usec*nbOfCaptureTurns + 1sec)
4) Acquisition (B1+B2): Event=HX.BPNM-CT (522), Max. delay = 0.5sec (to be checked)
For the time being you should receive the data for all three properties on these devices:
LHC.BPM.SR8.B1RB LHC.BPM.SR8.B2RA LHC.BPM.SR8.B2RB LHC.BPM.SX4.B1RA LHC.BPM.SX4.B1RB
Lars
To get further on with things, I've created a FESA class 'BPMLHC' that at
a fixed-frequency is sending simulated closed-orbit data over UDP to the machine
that Ralph proposed just
now (abcopl4).
The structure that was developed for the RT tests on the SPS has been modified
to cover the up to 36 channels (18 DAB modules) and the definition can be seen
attached to this
email .. Maybe we could think of a good place to put it so that we can share
the
structure definition
.. As you will see, the dynamic channel information (closed-orbit position and
spread) is put in the
'values' part of the structure and the static (channel-names and plane information)
is in the info-part ..
Presently only 'cfv-sr8-bpmb1la' is used to transmit the data but I could eventually
transmit data from 8 BPM systems and 3 BLM systems in SR8 ..
As a reminder:
The nominal situation is that we have 8 VME systems in each octant for the BPM
system:
SRX-BPMB1LA = BPMs for beam#1 in Left part #1 of IP X
SRX-BPMB1LB = BPMs for beam#1 in Left part #2 of IP X
SRX-BPMB1RA = BPMs for beam#1 in Right part #1 of IP X
SRX-BPMB1RB = BPMs for beam#1 in Right part #2 of IP X
and
SRX-BPMB2LA = BPMs for beam#2 in Left part #1 of IP X
SRX-BPMB2LB = BPMs for beam#2 in Left part #2 of IP X
SRX-BPMB2RA = BPMs for beam#2 in Right part #1 of IP X
SRX-BPMB2RB = BPMs for beam#2 in Right part #2 of IP X
( I won't mention here that this situation will most likely change due to radiation
worries in point 3 and 7)
For the time being we have these 8 systems installed in SR8 but more will come
during 2006 ...
Another reminder:
We will transmit the closed-orbit triggers from the LHC BST (in principle one
for each beam) with a fixed frequency (nominally 10Hz) and when the acquisition
has finished after
integrating over N turns (nominally 20msec) we calibrate the beam positions and
send the results for that
system (= a total of 36 measurement planes) to wherever over either UDP or CMW
and that's
where AB/BI responsibility stops .. The real-time feedback, closed-orbit application,
fixed-displays
etc etc etc could all take the latest data (possibly over two distinct channels)
fed from
the same concentrator where CMW etc etc would have to be incorporated ..
A similar approach could be taken for the multi-turn and the post-mortem GBytes..
Independent of the actual technology choice, the retrieval and concentration of BLM and BPM type data is very alike and it would be favourable to have a similar solution that is scalable to LHC dimensions.
Both systems, BLM and BPM can concurrently acquire 'periodic' and 'on-demand'
(snapshots) of the BPM and BLM properties which _both_ should be concentrated
prior to sending/publishing them to the users (GUIs, data logging etc.).
Since the user likely will request an acquisition of all BLMs/BPMs, the latter
type of acquisition may eventually block the periodic acquisition for
several seconds and vice versa if the requested snapshot of the history buffers
of one crate is processed in a traditional e.g. 'FIFO' manner.
A blocking of the periodic acquisition is undesirable since it may cause a
pause of feedback operation during potentially critical machine phases and
complicates receiving and logging of the data (larger receive buffers, complications
due to higher packet bursts -> packet/data loss etc...)
The blocking is caused by the transfer of the rather large BLM/BPM history
buffer and the limited bandwidth of the frontend's VME bus and Ethernet card
and further by the potentially 'hammering' of the BI systems through multiple
users (both for periodic and on-demand data) that may slow down and even crash
the individual systems.
In order to guarantee a reliable functioning of both, the periodic (logging)
and on-demand acquisition, the data retrievals have to be scheduled with respect
to each other and the frontend be screened by the concentrators. This of course
implies that the concentrators have to be aware of this scheduling and further
implement certain BI access functionality that would traditionally (LEP/SPS)
be implemented in the BI frontends.
Some details on the requirements:
The on-demand acquisition consist in case of the BLMs of the 40 us beam loss
history buffer and in case of the BPMs of the 100k turn/bunch acquisition.
Apart from the physical context and meaning both buffers are very similar from
the controls point of view. The readout of these buffers can be requested through
the 'post-mortem' trigger (intrinsically causing a beam abort) as well as on-demand
during a physics run for beam diagnostic/study purposes without the necessity
of dumping the beam. The crux of the matter, these on-demand acquisitions will
most likely requested when the availability of the periodic data is most critical
for feedback and logging (start of ramp, squeeze etc.).
A short summary of frontend characteristics that are relevant for controls:
The typical BLM and BPM BI frontend consists of about 16-18 data acquisition
boards (for simplicity purpose referred to as DAB, BPM numbers are given
in brackets):
- up to 16(18) DABs per crate, LHC: 25 (64++) crates total
- periodic acquisition: ~33.2 kByte/sample/crate (~1.2 kByte/sample/crate)
- sampling frequency (logging):
* BLM: ~ 1 Hz (logging), 50-100 Hz (collimation: Stefano/ RA)
* BPM: ~ 10 Hz(nominal) -> 25 Hz (as an option), ultimate: 50 Hz
- acquisition of history buffer:
* BLM: < 6 MB per DAB, < 96 MB/crate, ~2.5 GB for LHC
* BPM: ~ 0.95 MB per DAB, ~ 17 MB/crate, ~1.1 GB for LHC
- frontend bottlenecks are given by:
* VMEbus (though having VME64x: DAB are 32bit cards!):
30-35 MB/s, effectively: ~30/35MB/s can shrink to 5-10 MB/s (source VITA1)
* Ethernet card: 100 MBits/s, effectively: 5-6 MB/s (burst)
1: John Rynearson (tech.-dir. VITA), "VMEbus Speed/Bandwidth", VITA, 1999
Some more detailed comments:
The data transfer of the history buffer through the VMEbus/Ethernet is clearly
the bottleneck. Experience with the LHC BPM test system in the SPS showed that
the transfer of about 800 kByte through the VMEbus, minimal processing and
Ethernet card took about 1.2 seconds, corresponding to an effective VMEbus
speed of about 800 kByte/s. Hypothetically, the readout of one BPM crate would
thus require about 21 seconds (BLM ~ 120 seconds). This number is rather frightening
but is hopefully not final. I think/hope that the from VITA quoted 5-10
MB/s will apply for the final server implementations... In any case, the real
numbers
can be easily be verified. (At least for the
BPM we will do some tests this year with a fully equipped BPM front-end.)
In case of a FIFO readout (e.g. using DMA) of the on-demand data will use the full capacity of the VMEbus, Ethernet and potentially CPU and will block in any case several samples of the periodic acquisition of BLM and BPM data till the on-demand data is transmitted.
A relatively easy solution, which seem to be OK with Lars and Stephen (BI),
would be:
To divide the readout and transmission of the on-demand buffer (40us BLM/100k
turn BPM data) and to send them as chunks interleaved with the periodic data
at the periodic data acquisition frequency. There are several implementation
possibilities - left 'ad lib' to the frontend experts.
In order to guarantee this interleaving, the concentrators have to be aware of the chunk data format and be able to re-assemble it. In order to guarantee that no other 100k turn acquisition (BPM) or readout of the 40us data is requested while the data is transmitted to the concentrator, there has to be a mediator. This mediator, ideally the concentrator itself, would control the acquisition request for the BI frontends and blocks potential further user requests. As a consequence the BPM/BLM concentrator would have to implement certain acquisition functions that traditionally would be implemented in the BI frontend itself. As an example: The BPM concentrator would receive the acquisition request and its configuration (number of bunches, bunch pattern, turns), transmit it to the BPM frontends, trigger the BST for a LHC wide acquisition, retrieve the interleaved 'on demand' data, publish them for example via CMW and block all further acquisition request till the previous one is finished. The interface functionality are well documented and can be found in the BLM and BPM specification.
The concentration of the periodic and on-demand data could of course be run on the same machine. However, for reliability reasons it is advisable to distribute and mirror the functionality to several servers.In the end those concentrators will have to swallow (receive, process and publish) about 1-2.5 GB for one acquisition.In case the one and only concentrator fails, we will be blind with respect to diagnostics! Handling of this amount of data is highly non-trivial and requires IMHO great care!
Warning: technical details following:
************************************************************************
*Another more technical issue is related to the Ethernet implementation of
LynxOS, since it cannot intrinsically guarantee real-time transmission and
controlled timing of when packets that are put into the TCP/IP stack actually
leave the send buffer of the Ethernet card. In other words: The on-demand/periodic
data may compete with the sending of periodic data and other processes (namely
NFS - one of the killer!). Showcase: Somebody copies a file through NFS or
the frontend creates an image of the process memory.
(Another - hopefully not so common - failure mode: Reboot/crash of the central
NFS fileserver that would stall all diskless and most other LynxOS machines.)
Thus the sending server has to actively schedule the transmission of the data using a real-time capable protocol in order to avoid collisions, congestion and packet loss due to small send/receive buffers.The choice of such a protocol that allow transmission control and that can be used within our IP based technical (Ethernet) network is quite limited. The technical network provides scheduling of packets only based on the information provided up to the protocol level 4 (UDP or TCP).
*Another network issue is the fragmentation of packets. If the data put into the level 4 protocol exceeds the Ethernet frame size the packet is automatically split into several new packets that are normally re-assembled on the receiving end (client). However, the fragments lack the header of the first packet and cannot be easily scheduled. Hence, for security reasons, many networks block fragmented packets since they can potentially be used for a denial of service and other nasty things. Thus and as well as for efficiency reasons, the data sizes to be send should be adjusted and below the maximum ethernet framesize. (E.g. Periodic BPM data packet is with ~1.2 kByte below the technical network ethernet framesize of ~1496 bytes/frame)
-> Essentially there is only one level 4 protocol that can fulfil real-time requirements and can control congestion with a fine granularity!
In case a temporarily total block of the periodic data stream is not an issue, one may of course at discretion choose another protocol and nest higher level protocols (such as CMW) in it. However, I would advise to perform some simple stress test with full load and to quantify the probability of a failure case.
************************************************************************
Important question:
Who is going to implement the data concentrator for BLM and BPM on-demand buffers
(40us data and 100k turn acquisition)?
To my knowledge:
- Stephen has/is implementing the LHC BLM frontends.
- Greg has implemented a CMW based prototype to retrieve the periodic BLM data
acquisition (1 Hz)
- Lars has/is implementing the LHC BPM frontends using UDP for the delivery
of the periodic data (tested).
- I provided a simple logging server to capture those data. However, it is
foreseen to replace this by be a concentrator that is essentially the same
as the LHC orbit feedback controller.
- Data concentration and re-publishing of the
* BLM 40us history buffer acquisition: ?????
* BPM 100k turn history buffer acquisition: ?????
Maybe the wrong (political) question: Who is responsible?
Some general questions that comes to my mind that we maybe should address:
From the data processing point of view, how will the 2.5 GB data be analysed?
Online?? Smart data storage and formats may be required.
How long does it take to parse a 2.5 GB SDDS file?
Will there be dedicated (specialised) BLM frontends that will deliver the loss
data at a rate of 50-100 Hz to meet the collimation's wish list?