Discussion:
[Insight-developers] DICOM UID generation
Miller, James V (Research)
2005-01-05 15:30:37 UTC
Permalink
With the addition of GDCM to ITK, we can now write DICOM files. However, we
have no mechanism for generating the UIDs that are needed within a DICOM
David Clunie
2005-01-05 21:46:29 UTC
Permalink
Hi Jim

Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.

For other alternatives, see:

"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"

Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.

Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.

The question also always arises as to whether it is safer to require the
installer/user of a toolkit to acquire and install their own root rather
than use the same one supplied to all users of the toolkit. In general
it is extremely hard to guarantee that all instances of the toolkit
compiled and installed anywhere on any platform will generate unique IDs.
Conversely, it is hard to get users of toolkits to do the right thing.

Having accounted for that problem, another is to be sure that not only
are all generated images assigned a unique SOP Instance UID, but that if
they are part of the same (new) series, they must have a new unique Series
Instance UID that is the same for all images in that Series. Same goes
for the Study Instance UID, though you can add to an existing study, but
not to an existing series unless you are the equipment that created that
series in the first place. Same goes for Frame of Reference UID, which
obviously needs a lot of attention in a registration toolkit !

Typical mistakes generating UIDs, by the way, are to exceed 64 bytes total,
and to use leading zeroes in numeric components, both of which are illegal
and cause significant problems downstream.

The formal rules are in PS 3.8 Annex F and PS 3.5 Section 6.1 and ISO 8824.

There are a few more comments of mine in the FAQ at:

"http://www.dclunie.com/medical-image-faq/html/part2.html#UID"

David
Post by Miller, James V (Research)
With the addition of GDCM to ITK, we can now write DICOM files. However, we
have no mechanism for generating the UIDs that are needed within a DICOM
file.
Mathieu Malaterre
2005-01-06 14:51:01 UTC
Permalink
Post by David Clunie
Hi Jim
Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.
"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"
Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.
Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.
David,

Could you please comment on the code bellow. This is extracted from
GDCM CVS tree.

The code works as follow:
echo "gdcm" | od -b
0000000 147 144 143 155 012

Then we build the UID with:

radical (if passed in) + 147.144.143.155 + IP + time()

Therefore even organization without radical can create unique DICOM
images internal within their organization.

Mathieu

/**
* \ingroup Util
* \brief Return the IP adress of the machine writting the DICOM image
*/
std::string Util::GetIPAddress()
{
// This is a rip from
http://www.codeguru.com/Cpp/I-N/internet/network/article.php/c3445/
#ifndef HOST_NAME_MAX
// SUSv2 guarantees that `Host names are limited to 255 bytes'.
// POSIX 1003.1-2001 guarantees that `Host names (not including the
// terminating NUL) are limited to HOST_NAME_MAX bytes'.
# define HOST_NAME_MAX 255
// In this case we should maybe check the string was not truncated.
// But I don't known how to check that...
#if defined(_MSC_VER) || defined(__BORLANDC__)
// with WinSock DLL we need to initialise the WinSock before using
gethostname
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA WSAData;
int err = WSAStartup(wVersionRequested,&WSAData);
if (err != 0) {
/* Tell the user that we could not find a usable */
/* WinSock DLL. */
WSACleanup();
return "127.0.0.1";
}
#endif

#endif //HOST_NAME_MAX

std::string str;
char szHostName[HOST_NAME_MAX+1];
int r = gethostname(szHostName, HOST_NAME_MAX);

if( r == 0 )
{
// Get host adresses
struct hostent * pHost = gethostbyname(szHostName);

for( int i = 0; pHost!= NULL && pHost->h_addr_list[i]!= NULL; i++ )
{
for( int j = 0; j<pHost->h_length; j++ )
{
if( j > 0 ) str += ".";

str += Util::Format("%u",
(unsigned int)((unsigned char*)pHost->h_addr_list[i])[j]);
}
// str now contains one local IP address

#if defined(_MSC_VER) || defined(__BORLANDC__)
WSACleanup();
#endif

}
}
// If an error occur r == -1
// Most of the time it will return 127.0.0.1...
return str;
}

/**
* \ingroup Util
* \brief Creates a new UID. As stipulate in the DICOM ref
* each time a DICOM image is create it should have
* a unique identifier (URI)
*/
std::string Util::CreateUniqueUID(const std::string& root)
{
// The code works as follow:
// echo "gdcm" | od -b
// 0000000 147 144 143 155 012
// Therefore we return
// radical + 147.144.143.155 + IP + time()
std::string radical = root;
if( !root.size() ) //anything better ?
{
radical = "0.0."; // Is this really usefull ?
}
// else
// A root was specified use it to forge our new UID:
radical += "147.144.143.155"; // gdcm
radical += ".";
radical += Util::GetIPAddress();
radical += ".";
radical += Util::GetCurrentDate();
radical += ".";
radical += Util::GetCurrentTime();

return radical;
}
David Clunie
2005-01-06 16:19:43 UTC
Permalink
Hi Mathieu

The use of IP address is very problematic.

Most of us work using private IP addresses behind routers that do
network address translation, therefore the chances of collisions
based on IP address are extremely high. E.g. there are thousands
of 10.1.1.1 and 192.168.1.1 hosts in the world.

MAC address is much better, but harder to get in a platform
neutral way. Some platforms have a hostid.

For a timestamp, one needs precision well down into the millisecond
range, and something like the Unix milliseconds since epoch might do.

I would always try and include a thread and/or process number as
well, since it is quite likely that separate threads or processes
on the same machine might generate UIDs at the same time with a
granularity of milliseconds, given the speed of today's servers.

I try and throw in a random number if at all possible, just in case.

Another possibility is to use some form of cryptographic random number
generator that creates a number that is very, very likely to be unique.

You might want to take a look at the Javadoc describing java.rmi.server.UID
and java.security.SecureRandom for some ideas, even though you use a
different platform.

It also depends also on how deterministic one wants the generation process
to be, if at all.

More questions than answers perhaps, but the simplistic approach
suggested below is very likely not robust enough for generating
UIDs that will escape into a clinical PACS environment, where uniqueness
is extremely important, because of a) the use of IP address and b) the
use of perhaps insufficiently precise time. The inclusion of the "gdcm"
as octal may not help since the root that precedes it should be
unique enough to distinguish the toolkit, and all users of gdcm will be
adding this extra unhelpful component.

Discussing this makes me cringe when I think how inadequate some of
the techniques I use in my own toolkits are under various circumstances.
Time to revisit them again I suppose.

David
Post by Mathieu Malaterre
David,
Could you please comment on the code bellow. This is extracted from
GDCM CVS tree.
echo "gdcm" | od -b
0000000 147 144 143 155 012
radical (if passed in) + 147.144.143.155 + IP + time()
Therefore even organization without radical can create unique DICOM
images internal within their organization.
Mathieu
/**
* \ingroup Util
* \brief Return the IP adress of the machine writting the DICOM image
*/
std::string Util::GetIPAddress()
{
// This is a rip from
http://www.codeguru.com/Cpp/I-N/internet/network/article.php/c3445/
#ifndef HOST_NAME_MAX
// SUSv2 guarantees that `Host names are limited to 255 bytes'.
// POSIX 1003.1-2001 guarantees that `Host names (not including the
// terminating NUL) are limited to HOST_NAME_MAX bytes'.
# define HOST_NAME_MAX 255
// In this case we should maybe check the string was not truncated.
// But I don't known how to check that...
#if defined(_MSC_VER) || defined(__BORLANDC__)
// with WinSock DLL we need to initialise the WinSock before using
gethostname
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA WSAData;
int err = WSAStartup(wVersionRequested,&WSAData);
if (err != 0) {
/* Tell the user that we could not find a usable */
/* WinSock DLL. */
WSACleanup();
return "127.0.0.1";
}
#endif
#endif //HOST_NAME_MAX
std::string str;
char szHostName[HOST_NAME_MAX+1];
int r = gethostname(szHostName, HOST_NAME_MAX);
if( r == 0 )
{
// Get host adresses
struct hostent * pHost = gethostbyname(szHostName);
for( int i = 0; pHost!= NULL && pHost->h_addr_list[i]!= NULL; i++ )
{
for( int j = 0; j<pHost->h_length; j++ )
{
if( j > 0 ) str += ".";
str += Util::Format("%u",
(unsigned int)((unsigned char*)pHost->h_addr_list[i])[j]);
}
// str now contains one local IP address
#if defined(_MSC_VER) || defined(__BORLANDC__)
WSACleanup();
#endif
}
}
// If an error occur r == -1
// Most of the time it will return 127.0.0.1...
return str;
}
/**
* \ingroup Util
* \brief Creates a new UID. As stipulate in the DICOM ref
* each time a DICOM image is create it should have
* a unique identifier (URI)
*/
std::string Util::CreateUniqueUID(const std::string& root)
{
// echo "gdcm" | od -b
// 0000000 147 144 143 155 012
// Therefore we return
// radical + 147.144.143.155 + IP + time()
std::string radical = root;
if( !root.size() ) //anything better ?
{
radical = "0.0."; // Is this really usefull ?
}
// else
radical += "147.144.143.155"; // gdcm
radical += ".";
radical += Util::GetIPAddress();
radical += ".";
radical += Util::GetCurrentDate();
radical += ".";
radical += Util::GetCurrentTime();
return radical;
}
Mathieu Malaterre
2005-01-06 16:29:27 UTC
Permalink
Thanks a lot. I'll try to come with an implementation ASAP.

Getting the MAC address seems a lot better than IP. I'll also try to
look into the problem of time precision but this doesn't seems too hard.

Thanks for feedback.
Mathieu
Post by David Clunie
Hi Mathieu
The use of IP address is very problematic.
Most of us work using private IP addresses behind routers that do
network address translation, therefore the chances of collisions
based on IP address are extremely high. E.g. there are thousands
of 10.1.1.1 and 192.168.1.1 hosts in the world.
MAC address is much better, but harder to get in a platform
neutral way. Some platforms have a hostid.
For a timestamp, one needs precision well down into the millisecond
range, and something like the Unix milliseconds since epoch might do.
I would always try and include a thread and/or process number as
well, since it is quite likely that separate threads or processes
on the same machine might generate UIDs at the same time with a
granularity of milliseconds, given the speed of today's servers.
I try and throw in a random number if at all possible, just in case.
Another possibility is to use some form of cryptographic random number
generator that creates a number that is very, very likely to be unique.
You might want to take a look at the Javadoc describing java.rmi.server.UID
and java.security.SecureRandom for some ideas, even though you use a
different platform.
It also depends also on how deterministic one wants the generation process
to be, if at all.
More questions than answers perhaps, but the simplistic approach
suggested below is very likely not robust enough for generating
UIDs that will escape into a clinical PACS environment, where uniqueness
is extremely important, because of a) the use of IP address and b) the
use of perhaps insufficiently precise time. The inclusion of the "gdcm"
as octal may not help since the root that precedes it should be
unique enough to distinguish the toolkit, and all users of gdcm will be
adding this extra unhelpful component.
Discussing this makes me cringe when I think how inadequate some of
the techniques I use in my own toolkits are under various circumstances.
Time to revisit them again I suppose.
David
Post by Mathieu Malaterre
David,
Could you please comment on the code bellow. This is extracted
from GDCM CVS tree.
echo "gdcm" | od -b
0000000 147 144 143 155 012
radical (if passed in) + 147.144.143.155 + IP + time()
Therefore even organization without radical can create unique
DICOM images internal within their organization.
Mathieu
/**
* \ingroup Util
* \brief Return the IP adress of the machine writting the DICOM image
*/
std::string Util::GetIPAddress()
{
// This is a rip from
http://www.codeguru.com/Cpp/I-N/internet/network/article.php/c3445/
#ifndef HOST_NAME_MAX
// SUSv2 guarantees that `Host names are limited to 255 bytes'.
// POSIX 1003.1-2001 guarantees that `Host names (not including the
// terminating NUL) are limited to HOST_NAME_MAX bytes'.
# define HOST_NAME_MAX 255
// In this case we should maybe check the string was not truncated.
// But I don't known how to check that...
#if defined(_MSC_VER) || defined(__BORLANDC__)
// with WinSock DLL we need to initialise the WinSock before using
gethostname
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA WSAData;
int err = WSAStartup(wVersionRequested,&WSAData);
if (err != 0) {
/* Tell the user that we could not find a usable */
/* WinSock DLL. */
WSACleanup();
return "127.0.0.1";
}
#endif
#endif //HOST_NAME_MAX
std::string str;
char szHostName[HOST_NAME_MAX+1];
int r = gethostname(szHostName, HOST_NAME_MAX);
if( r == 0 )
{
// Get host adresses
struct hostent * pHost = gethostbyname(szHostName);
for( int i = 0; pHost!= NULL && pHost->h_addr_list[i]!= NULL; i++ )
{
for( int j = 0; j<pHost->h_length; j++ )
{
if( j > 0 ) str += ".";
str += Util::Format("%u",
(unsigned int)((unsigned char*)pHost->h_addr_list[i])[j]);
}
// str now contains one local IP address
#if defined(_MSC_VER) || defined(__BORLANDC__)
WSACleanup();
#endif
}
}
// If an error occur r == -1
// Most of the time it will return 127.0.0.1...
return str;
}
/**
* \ingroup Util
* \brief Creates a new UID. As stipulate in the DICOM ref
* each time a DICOM image is create it should have
* a unique identifier (URI)
*/
std::string Util::CreateUniqueUID(const std::string& root)
{
// echo "gdcm" | od -b
// 0000000 147 144 143 155 012
// Therefore we return
// radical + 147.144.143.155 + IP + time()
std::string radical = root;
if( !root.size() ) //anything better ?
{
radical = "0.0."; // Is this really usefull ?
}
// else
radical += "147.144.143.155"; // gdcm
radical += ".";
radical += Util::GetIPAddress();
radical += ".";
radical += Util::GetCurrentDate();
radical += ".";
radical += Util::GetCurrentTime();
return radical;
}
_______________________________________________
Insight-developers mailing list
http://www.itk.org/mailman/listinfo/insight-developers
Miller, James V (Research)
2005-01-05 22:07:08 UTC
Permalink
Thanks David.

Perhaps Stephen or Will should apply for a root id at
http://www.medicalconnections.co.uk/html/free_uid.html
on behalf of the Insight Software Consortium.

Jim



-----Original Message-----
From: David Clunie [mailto:***@dclunie.com]
Sent: Wednesday, January 05, 2005 4:46 PM
To: Insight-developers (E-mail)
Subject: Re: [Insight-developers] DICOM UID generation


Hi Jim

Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.

For other alternatives, see:

"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"

Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.

Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.

The question also always arises as to whether it is safer to require the
installer/user of a toolkit to acquire and install their own root rather
than use the same one supplied to all users of the toolkit. In general
it is extremely hard to guarantee that all instances of the toolkit
compiled and installed anywhere on any platform will generate unique IDs.
Conversely, it is hard to get users of toolkits to do the right thing.

Having accounted for that problem, another is to be sure that not only
are all generated images assigned a unique SOP Instance UID, but that if
they are part of the same (new) series, they must have a new unique Series
Instance UID that is the same for all images in that Series. Same goes
for the Study Instance UID, though you can add to an existing study, but
not to an existing series unless you are the equipment that created that
series in the first place. Same goes for Frame of Reference UID, which
obviously needs a lot of attention in a registration toolkit !

Typical mistakes generating UIDs, by the way, are to exceed 64 bytes total,
and to use leading zeroes in numeric components, both of which are illegal
and cause significant problems downstream.

The formal rules are in PS 3.8 Annex F and PS 3.5 Section 6.1 and ISO 8824.

There are a few more comments of mine in the FAQ at:

"http://www.dclunie.com/medical-image-faq/html/part2.html#UID"

David
Post by Miller, James V (Research)
With the addition of GDCM to ITK, we can now write DICOM files. However,
we
Post by Miller, James V (Research)
have no mechanism for generating the UIDs that are needed within a DICOM
file.
David Clunie
2005-01-05 22:14:35 UTC
Permalink
Hi Jim

The problem with Dave Harvey's UID roots is that they are
quite long and waste quite a few of the precious 64 characters.

The IANA SNMP UIDs are much shorter.

Also, be sure that when you get a root, you subdivide the space
and assign someone to manage it. E.g. add a ".1" for your current
needs, anticipating that for a completely different project you
can use the same root plus ".2" or whatever later on.

E.g., you can actually use it for SNMP devices as intended should
it be necessary !

David
Post by Miller, James V (Research)
Thanks David.
Perhaps Stephen or Will should apply for a root id at
http://www.medicalconnections.co.uk/html/free_uid.html
on behalf of the Insight Software Consortium.
Jim
-----Original Message-----
Sent: Wednesday, January 05, 2005 4:46 PM
To: Insight-developers (E-mail)
Subject: Re: [Insight-developers] DICOM UID generation
Hi Jim
Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.
"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"
Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.
Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.
The question also always arises as to whether it is safer to require the
installer/user of a toolkit to acquire and install their own root rather
than use the same one supplied to all users of the toolkit. In general
it is extremely hard to guarantee that all instances of the toolkit
compiled and installed anywhere on any platform will generate unique IDs.
Conversely, it is hard to get users of toolkits to do the right thing.
Having accounted for that problem, another is to be sure that not only
are all generated images assigned a unique SOP Instance UID, but that if
they are part of the same (new) series, they must have a new unique Series
Instance UID that is the same for all images in that Series. Same goes
for the Study Instance UID, though you can add to an existing study, but
not to an existing series unless you are the equipment that created that
series in the first place. Same goes for Frame of Reference UID, which
obviously needs a lot of attention in a registration toolkit !
Typical mistakes generating UIDs, by the way, are to exceed 64 bytes total,
and to use leading zeroes in numeric components, both of which are illegal
and cause significant problems downstream.
The formal rules are in PS 3.8 Annex F and PS 3.5 Section 6.1 and ISO 8824.
"http://www.dclunie.com/medical-image-faq/html/part2.html#UID"
David
Post by Miller, James V (Research)
With the addition of GDCM to ITK, we can now write DICOM files. However,
we
Post by Miller, James V (Research)
have no mechanism for generating the UIDs that are needed within a DICOM
file.
Stephen R. Aylward
2005-01-05 22:40:44 UTC
Permalink
Done. Attached is the completed form.

I will let you know when I get the email containing the UID.

One thing - this is a "default UID" for ITK. We need to make it easy
for people to change the default UID for ITK. Can we make the UID by a
value of an advanced option in CMAKE? Is that sufficiently permanent -
what if they clear the cache - they'll have to remember to re-type their
UID...or we could also store in a separate file in the binary
directory...a file that contains cmake variables that persist beyond the
"cache?"

Stephen

PS> A copy of ISC's UID request form is at
Loading Image...
I will add this to the InsightDocuments directory or any other place
ya'll suggest. It has a short blurb about us being allowed to
sub-delegate for others to use...
Post by Miller, James V (Research)
Thanks David.
Perhaps Stephen or Will should apply for a root id at
http://www.medicalconnections.co.uk/html/free_uid.html
on behalf of the Insight Software Consortium.
Jim
-----Original Message-----
Sent: Wednesday, January 05, 2005 4:46 PM
To: Insight-developers (E-mail)
Subject: Re: [Insight-developers] DICOM UID generation
Hi Jim
Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.
"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"
Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.
Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.
The question also always arises as to whether it is safer to require the
installer/user of a toolkit to acquire and install their own root rather
than use the same one supplied to all users of the toolkit. In general
it is extremely hard to guarantee that all instances of the toolkit
compiled and installed anywhere on any platform will generate unique IDs.
Conversely, it is hard to get users of toolkits to do the right thing.
Having accounted for that problem, another is to be sure that not only
are all generated images assigned a unique SOP Instance UID, but that if
they are part of the same (new) series, they must have a new unique Series
Instance UID that is the same for all images in that Series. Same goes
for the Study Instance UID, though you can add to an existing study, but
not to an existing series unless you are the equipment that created that
series in the first place. Same goes for Frame of Reference UID, which
obviously needs a lot of attention in a registration toolkit !
Typical mistakes generating UIDs, by the way, are to exceed 64 bytes total,
and to use leading zeroes in numeric components, both of which are illegal
and cause significant problems downstream.
The formal rules are in PS 3.8 Annex F and PS 3.5 Section 6.1 and ISO 8824.
"http://www.dclunie.com/medical-image-faq/html/part2.html#UID"
David
Post by Miller, James V (Research)
With the addition of GDCM to ITK, we can now write DICOM files.
However,
we
Post by Miller, James V (Research)
have no mechanism for generating the UIDs that are needed within a DICOM
file.
Miller, James V (Research)
2005-01-06 15:08:21 UTC
Permalink
Mathieu,

My main concern with this UID generation algorithm is that
it takes up 15 characters (16 with the trailing dot) to
add the encoding of "gdcm" into the uid. This seems like
a lot of characters and all it does is ensure that gdcm
uids are REALLY REALLY unique from the other uids that
may be generated from that root. But the ip address and
datestamp/timestamp should make the uid's unique enough.

You also want to transform the ip address somehow so that
the UID cannot be easily tracked back to a particular
machine. Having an IP address in the UID could be
an issue with ensuring anonymonization of data.

Jim



-----Original Message-----
From: Mathieu Malaterre [mailto:***@kitware.com]
Sent: Thursday, January 06, 2005 9:51 AM
To: ***@dclunie.com
Cc: Insight-developers (E-mail)
Subject: Re: [Insight-developers] DICOM UID generation
Post by David Clunie
Hi Jim
Sending money to ANSI is a complete waste of money, since there are
free UID roots available, and nobody cares where your root comes from
as long as it is unique.
"http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration"
Your biggest problem though is not getting a root, it is making sure
that every file generated by ITK anywhere no matter where and by whom
it is installed is globally unique.
Typically this is done with something unique to the device on which it
is installed, e.g. serial number, hostid, MAC address or similar, as
well as any process or thread running on that device (e.g. process
number). It is very hard to get this right in a multi-platform toolkit.
The MAC address, process number, a date time stamp with high precision
and a random number might be necessary. If it won't all fit into 64
bytes, considering feeding everything (after the root) into some sort
of cryptographic hash function.
David,

Could you please comment on the code bellow. This is extracted from
GDCM CVS tree.

The code works as follow:
echo "gdcm" | od -b
0000000 147 144 143 155 012

Then we build the UID with:

radical (if passed in) + 147.144.143.155 + IP + time()

Therefore even organization without radical can create unique DICOM
images internal within their organization.

Mathieu

/**
* \ingroup Util
* \brief Return the IP adress of the machine writting the DICOM image
*/
std::string Util::GetIPAddress()
{
// This is a rip from
http://www.codeguru.com/Cpp/I-N/internet/network/article.php/c3445/
#ifndef HOST_NAME_MAX
// SUSv2 guarantees that `Host names are limited to 255 bytes'.
// POSIX 1003.1-2001 guarantees that `Host names (not including the
// terminating NUL) are limited to HOST_NAME_MAX bytes'.
# define HOST_NAME_MAX 255
// In this case we should maybe check the string was not truncated.
// But I don't known how to check that...
#if defined(_MSC_VER) || defined(__BORLANDC__)
// with WinSock DLL we need to initialise the WinSock before using
gethostname
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA WSAData;
int err = WSAStartup(wVersionRequested,&WSAData);
if (err != 0) {
/* Tell the user that we could not find a usable */
/* WinSock DLL. */
WSACleanup();
return "127.0.0.1";
}
#endif

#endif //HOST_NAME_MAX

std::string str;
char szHostName[HOST_NAME_MAX+1];
int r = gethostname(szHostName, HOST_NAME_MAX);

if( r == 0 )
{
// Get host adresses
struct hostent * pHost = gethostbyname(szHostName);

for( int i = 0; pHost!= NULL && pHost->h_addr_list[i]!= NULL; i++ )
{
for( int j = 0; j<pHost->h_length; j++ )
{
if( j > 0 ) str += ".";

str += Util::Format("%u",
(unsigned int)((unsigned char*)pHost->h_addr_list[i])[j]);
}
// str now contains one local IP address

#if defined(_MSC_VER) || defined(__BORLANDC__)
WSACleanup();
#endif

}
}
// If an error occur r == -1
// Most of the time it will return 127.0.0.1...
return str;
}

/**
* \ingroup Util
* \brief Creates a new UID. As stipulate in the DICOM ref
* each time a DICOM image is create it should have
* a unique identifier (URI)
*/
std::string Util::CreateUniqueUID(const std::string& root)
{
// The code works as follow:
// echo "gdcm" | od -b
// 0000000 147 144 143 155 012
// Therefore we return
// radical + 147.144.143.155 + IP + time()
std::string radical = root;
if( !root.size() ) //anything better ?
{
radical = "0.0."; // Is this really usefull ?
}
// else
// A root was specified use it to forge our new UID:
radical += "147.144.143.155"; // gdcm
radical += ".";
radical += Util::GetIPAddress();
radical += ".";
radical += Util::GetCurrentDate();
radical += ".";
radical += Util::GetCurrentTime();

return radical;
}
Mathieu Malaterre
2005-01-06 15:43:18 UTC
Permalink
Jim,
Post by Miller, James V (Research)
My main concern with this UID generation algorithm is that
it takes up 15 characters (16 with the trailing dot) to
add the encoding of "gdcm" into the uid. This seems like
a lot of characters and all it does is ensure that gdcm
uids are REALLY REALLY unique from the other uids that
may be generated from that root. But the ip address and
datestamp/timestamp should make the uid's unique enough.
Make sense.
Post by Miller, James V (Research)
You also want to transform the ip address somehow so that
the UID cannot be easily tracked back to a particular
machine. Having an IP address in the UID could be
an issue with ensuring anonymonization of data.
Anything particular in mind. Or I can just use the GDCM encoding to
scramble the IP adress:

(IP + 147.144.143.155) modulo 255.255.255.255


Mathieu
Loading...