C++ to ASM linkage in GCC: Difference between revisions

Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(Replace dead link with the local calling conventions article)
m (Prefer writing assembly instead of ASM and assembly files are not "scripts")
Line 1: Line 1:
A small note before we begin: The GNU Compiler Collection C compiler, a very versatile compiler that has been around for a very long time, is pretty much the standard for OS Dev'ing, since it is even used (as you probably already know) to compile the Linux kernel. In fact, Linux is meant to be compiled with GCC. It has lots of useful extensions (__attribute__(())) which ease development by leaps and bounds, should you take the time to read about it enough.
A small note before we begin: The GNU Compiler Collection C compiler, a very versatile compiler that has been around for a very long time, is pretty much the standard for OS Dev'ing, since it is even used (as you probably already know) to compile the Linux kernel. In fact, Linux is meant to be compiled with GCC. It has lots of useful extensions (__attribute__(())) which ease development by leaps and bounds, should you take the time to read about it enough.


Also, this article in itself is not really sufficient for a full understanding of Linking to C++ methods within C or ASM. C++ uses the hidden 'this' pointer, which will be discussed in another article proposed for a later date.
Also, this article in itself is not really sufficient for a full understanding of Linking to C++ methods within C or assembly. C++ uses the hidden 'this' pointer, which will be discussed in another article proposed for a later date.

We will assume the use of the GCC compiler for your HLL development, and the use of C++ (C wouldn't be that different) for your little HLL - ASM linkage escapade.


We will assume the use of the GCC compiler for your HLL development, and the use of C++ (C wouldn't be that different) for your little HLL - Assembly linkage escapade.


== C++ Name Mangling ==
== C++ Name Mangling ==
Line 23: Line 22:
The compiler uses the generated symbol name to encode information about the symbol. This generally says that 'this is a mangled symbol' (_Z). 'It has 8 characters of user-defined symbol relevance' (8), and those are [getObjId]. The 'E' is probably used to mean END, and after that GCC generally places several letters and namespace/object names as details on the arguments. (v=void, i=int, j=unsigned int...).
The compiler uses the generated symbol name to encode information about the symbol. This generally says that 'this is a mangled symbol' (_Z). 'It has 8 characters of user-defined symbol relevance' (8), and those are [getObjId]. The 'E' is probably used to mean END, and after that GCC generally places several letters and namespace/object names as details on the arguments. (v=void, i=int, j=unsigned int...).


After seeing this, you can tell that, in order therefore to call a C++ function from ASM, '''if''' the compiler has mangled it (there are many cases in which there is no need for mangling, and the compiler may naturally leave a symbol alone in many cases, e.g. global variables), you would need to use a executable symbol interpreter, such as <tt>nm</tt> or <tt>objdump</tt> to see the name GCC used in the generated object file.
After seeing this, you can tell that, in order therefore to call a C++ function from assembly, '''if''' the compiler has mangled it (there are many cases in which there is no need for mangling, and the compiler may naturally leave a symbol alone in many cases, e.g. global variables), you would need to use a executable symbol interpreter, such as <tt>nm</tt> or <tt>objdump</tt> to see the name GCC used in the generated object file.


Note well that varying compilers DO ''NOT'' use the same mangling scheme, and in fact, are encouraged by the C++ standards committee to go ahead and use their own mangling schemes as they see fit.
Note well that varying compilers DO ''NOT'' use the same mangling scheme, and in fact, are encouraged by the C++ standards committee to go ahead and use their own mangling schemes as they see fit.
Line 38: Line 37:
In C and C++, you make a symbol global by defining it ''outside'' of any function. In C++, you may still hide the symbol by having it inside a 'private' section of a global symbol. But this is irrelevant to this article, seeing as anyone who is reading this article should be attempting to develop an OS. We assume you already understand your language.
In C and C++, you make a symbol global by defining it ''outside'' of any function. In C++, you may still hide the symbol by having it inside a 'private' section of a global symbol. But this is irrelevant to this article, seeing as anyone who is reading this article should be attempting to develop an OS. We assume you already understand your language.


In ASM, (NASM, specifically; the author does not use GAS. Anyone who knows how GAS works may add to this article as they see fit.) all symbols are automatically local to the particular ASM script file in which they appear. To make a symbol global, you must use the 'global _SYMBOL_NAME_' directive. If I remember right, in GAS, that's '.globl _SYMBOL_NAME'.
In assembly, (NASM, specifically; the author does not use GAS. Anyone who knows how GAS works may add to this article as they see fit.) all symbols are automatically local to the particular assembly file in which they appear. To make a symbol global, you must use the 'global _SYMBOL_NAME_' directive. If I remember right, in GAS, that's '.globl _SYMBOL_NAME'.


<source lang="ASM">
<source lang="ASM">
Line 63: Line 62:
</source>
</source>


To clarify, technically, the linker can 'see' all symbols. It just chooses to ignore linkage between files for symbols not exclusively declared global. If you were to write a second script:
To clarify, technically, the linker can 'see' all symbols. It just chooses to ignore linkage between files for symbols not exclusively declared global. If you were to write a second file:


<source lang="ASM">
<source lang="ASM">
Line 84: Line 83:




So now we get back to the main question presented at the top of this section of the article: What is external linkage? External linkage is the linking to a global symbol which is not defined in the same file scope as the file you're working in. The linker will therefore place the referred variable's address where it is referenced in the referencing script.
So now we get back to the main question presented at the top of this section of the article: What is external linkage? External linkage is the linking to a global symbol which is not defined in the same file scope as the file you're working in. The linker will therefore place the referred variable's address where it is referenced in the referencing file.




Line 99: Line 98:
In other words, when you tell the compiler that you are linking to an external variable with "C" style linkage, you are telling the compiler: "I am linking to a symbol of name XYZ. This is EXACTLY how the symbol looks, and there is no name mangling.". That is all 'extern "C"' means. It explicitly tells the compiler that there is nothing special about the symbol's name, and that it is to be taken exactly as you type it.
In other words, when you tell the compiler that you are linking to an external variable with "C" style linkage, you are telling the compiler: "I am linking to a symbol of name XYZ. This is EXACTLY how the symbol looks, and there is no name mangling.". That is all 'extern "C"' means. It explicitly tells the compiler that there is nothing special about the symbol's name, and that it is to be taken exactly as you type it.


With this in mind, we not understand why in C, there is no need to specify a linkage style, since C only understands symbolas as you type them. C does not have any kind of name mangling, and expects plain, absolute symbol names. So to link to ASM in C, you just say 'extern SYMBOL NAME', and not 'extern "C" SYMBOL NAME'.
With this in mind, we not understand why in C, there is no need to specify a linkage style, since C only understands symbolas as you type them. C does not have any kind of name mangling, and expects plain, absolute symbol names. So to link to assembly f C, you just say 'extern SYMBOL NAME', and not 'extern "C" SYMBOL NAME'.


So <tt>extern "C" getObjId</tt> tell the compiler to insert references to a symbol of the exact name 'getObjId' within your output object file. The linker will see your references, and look for a global symbol of that exact name, and if it is found, and there are no duplicates, it simply places the address of that symbol wherever the compiler placed a reference to it.
So <tt>extern "C" getObjId</tt> tell the compiler to insert references to a symbol of the exact name 'getObjId' within your output object file. The linker will see your references, and look for a global symbol of that exact name, and if it is found, and there are no duplicates, it simply places the address of that symbol wherever the compiler placed a reference to it.
Line 114: Line 113:


So linking to 'extern "C"' symbols is the same as telling the compiler to just trust you, and place references to XYZ symbol name as is, even though it may never see the definition of that symbol within the ''current file''. The symbol is defined elsewhere. It may even be defined in a shared library, and be expected to be linked in by the OS at runtime (Dynamic Linking), although this usually requires a little more, and is usually handled by the compiler and linker as set up by the host OS (this is where OS libraries come in).
So linking to 'extern "C"' symbols is the same as telling the compiler to just trust you, and place references to XYZ symbol name as is, even though it may never see the definition of that symbol within the ''current file''. The symbol is defined elsewhere. It may even be defined in a shared library, and be expected to be linked in by the OS at runtime (Dynamic Linking), although this usually requires a little more, and is usually handled by the compiler and linker as set up by the host OS (this is where OS libraries come in).



''' "C++" Linkage '''
''' "C++" Linkage '''
Line 142: Line 140:
Then GCC is told to link to that symbol. It places a reference to a symbol name, all mangled in ''its'' own way. The linker is called on both object files. The second one is referencing a symbols which the linker sees nowhere. You are told the symbol referenced by the second object file does not exist.
Then GCC is told to link to that symbol. It places a reference to a symbol name, all mangled in ''its'' own way. The linker is called on both object files. The second one is referencing a symbols which the linker sees nowhere. You are told the symbol referenced by the second object file does not exist.


== C++ - ASM linkage ==
== C++ - Assembly linkage ==




I'm sorry I took this long to get to this part of the article, but the facts given above are pertinent.
I'm sorry I took this long to get to this part of the article, but the facts given above are pertinent.


To link to a C/C++ symbol from an ASM script, you simply tell the Assembler that the symbol is external. Assemblers ''always'' insert references as they see them, so they are always "C" style, if you please. The assembler takes you at your word. This is why you can look for the mangled symbol name of a function, or variable, and then actually place the mangled name into your ASM script (extern _ZN3foo3barEv) and it will work (depending on whether or not a 'this' pointer is involved).
To link to a C/C++ symbol from an assembly file, you simply tell the Assembler that the symbol is external. Assemblers ''always'' insert references as they see them, so they are always "C" style, if you please. The assembler takes you at your word. This is why you can look for the mangled symbol name of a function, or variable, and then actually place the mangled name into your assembly file (extern _ZN3foo3barEv) and it will work (depending on whether or not a 'this' pointer is involved).




To link to an ASM symbol from C/C++, you must know what the absolute symbol name in the ASM script is, and ensure it's global, then link to the absolute symbol name "C" style. (extern "C" SYMBOL_NAME).
To link to an assembly symbol from C/C++, you must know what the absolute symbol name in the assembly file is, and ensure it's global, then link to the absolute symbol name "C" style. (extern "C" SYMBOL_NAME).


The 'This' pointer issue is another thing altogether, and is actually a very serious consideration you should take into account when designing your kernel, or choosing whether or not to use C++ altogether. C makes library generation, and linking easier. C++ makes design and re-structuring (you will restructure your design many times, so this is a big plus) much easier.
The 'This' pointer issue is another thing altogether, and is actually a very serious consideration you should take into account when designing your kernel, or choosing whether or not to use C++ altogether. C makes library generation, and linking easier. C++ makes design and re-structuring (you will restructure your design many times, so this is a big plus) much easier.