C++ to ASM linkage in GCC: Difference between revisions

Jump to navigation Jump to search
m
Prefer writing assembly instead of ASM and assembly files are not "scripts"
[unchecked revision][unchecked revision]
(Replace dead link with the local calling conventions article)
m (Prefer writing assembly instead of ASM and assembly files are not "scripts")
Line 1:
A small note before we begin: The GNU Compiler Collection C compiler, a very versatile compiler that has been around for a very long time, is pretty much the standard for OS Dev'ing, since it is even used (as you probably already know) to compile the Linux kernel. In fact, Linux is meant to be compiled with GCC. It has lots of useful extensions (__attribute__(())) which ease development by leaps and bounds, should you take the time to read about it enough.
 
Also, this article in itself is not really sufficient for a full understanding of Linking to C++ methods within C or ASMassembly. C++ uses the hidden 'this' pointer, which will be discussed in another article proposed for a later date.
 
We will assume the use of the GCC compiler for your HLL development, and the use of C++ (C wouldn't be that different) for your little HLL - ASM linkage escapade.
 
We will assume the use of the GCC compiler for your HLL development, and the use of C++ (C wouldn't be that different) for your little HLL - ASMAssembly linkage escapade.
 
== C++ Name Mangling ==
Line 23 ⟶ 22:
The compiler uses the generated symbol name to encode information about the symbol. This generally says that 'this is a mangled symbol' (_Z). 'It has 8 characters of user-defined symbol relevance' (8), and those are [getObjId]. The 'E' is probably used to mean END, and after that GCC generally places several letters and namespace/object names as details on the arguments. (v=void, i=int, j=unsigned int...).
 
After seeing this, you can tell that, in order therefore to call a C++ function from ASMassembly, '''if''' the compiler has mangled it (there are many cases in which there is no need for mangling, and the compiler may naturally leave a symbol alone in many cases, e.g. global variables), you would need to use a executable symbol interpreter, such as <tt>nm</tt> or <tt>objdump</tt> to see the name GCC used in the generated object file.
 
Note well that varying compilers DO ''NOT'' use the same mangling scheme, and in fact, are encouraged by the C++ standards committee to go ahead and use their own mangling schemes as they see fit.
Line 38 ⟶ 37:
In C and C++, you make a symbol global by defining it ''outside'' of any function. In C++, you may still hide the symbol by having it inside a 'private' section of a global symbol. But this is irrelevant to this article, seeing as anyone who is reading this article should be attempting to develop an OS. We assume you already understand your language.
 
In ASMassembly, (NASM, specifically; the author does not use GAS. Anyone who knows how GAS works may add to this article as they see fit.) all symbols are automatically local to the particular ASM scriptassembly file in which they appear. To make a symbol global, you must use the 'global _SYMBOL_NAME_' directive. If I remember right, in GAS, that's '.globl _SYMBOL_NAME'.
 
<source lang="ASM">
Line 63 ⟶ 62:
</source>
 
To clarify, technically, the linker can 'see' all symbols. It just chooses to ignore linkage between files for symbols not exclusively declared global. If you were to write a second scriptfile:
 
<source lang="ASM">
Line 84 ⟶ 83:
 
 
So now we get back to the main question presented at the top of this section of the article: What is external linkage? External linkage is the linking to a global symbol which is not defined in the same file scope as the file you're working in. The linker will therefore place the referred variable's address where it is referenced in the referencing scriptfile.
 
 
Line 99 ⟶ 98:
In other words, when you tell the compiler that you are linking to an external variable with "C" style linkage, you are telling the compiler: "I am linking to a symbol of name XYZ. This is EXACTLY how the symbol looks, and there is no name mangling.". That is all 'extern "C"' means. It explicitly tells the compiler that there is nothing special about the symbol's name, and that it is to be taken exactly as you type it.
 
With this in mind, we not understand why in C, there is no need to specify a linkage style, since C only understands symbolas as you type them. C does not have any kind of name mangling, and expects plain, absolute symbol names. So to link to ASMassembly inf C, you just say 'extern SYMBOL NAME', and not 'extern "C" SYMBOL NAME'.
 
So <tt>extern "C" getObjId</tt> tell the compiler to insert references to a symbol of the exact name 'getObjId' within your output object file. The linker will see your references, and look for a global symbol of that exact name, and if it is found, and there are no duplicates, it simply places the address of that symbol wherever the compiler placed a reference to it.
Line 114 ⟶ 113:
 
So linking to 'extern "C"' symbols is the same as telling the compiler to just trust you, and place references to XYZ symbol name as is, even though it may never see the definition of that symbol within the ''current file''. The symbol is defined elsewhere. It may even be defined in a shared library, and be expected to be linked in by the OS at runtime (Dynamic Linking), although this usually requires a little more, and is usually handled by the compiler and linker as set up by the host OS (this is where OS libraries come in).
 
 
''' "C++" Linkage '''
Line 142 ⟶ 140:
Then GCC is told to link to that symbol. It places a reference to a symbol name, all mangled in ''its'' own way. The linker is called on both object files. The second one is referencing a symbols which the linker sees nowhere. You are told the symbol referenced by the second object file does not exist.
 
== C++ - ASMAssembly linkage ==
 
 
I'm sorry I took this long to get to this part of the article, but the facts given above are pertinent.
 
To link to a C/C++ symbol from an ASMassembly scriptfile, you simply tell the Assembler that the symbol is external. Assemblers ''always'' insert references as they see them, so they are always "C" style, if you please. The assembler takes you at your word. This is why you can look for the mangled symbol name of a function, or variable, and then actually place the mangled name into your ASMassembly scriptfile (extern _ZN3foo3barEv) and it will work (depending on whether or not a 'this' pointer is involved).
 
 
To link to an ASMassembly symbol from C/C++, you must know what the absolute symbol name in the ASMassembly scriptfile is, and ensure it's global, then link to the absolute symbol name "C" style. (extern "C" SYMBOL_NAME).
 
The 'This' pointer issue is another thing altogether, and is actually a very serious consideration you should take into account when designing your kernel, or choosing whether or not to use C++ altogether. C makes library generation, and linking easier. C++ makes design and re-structuring (you will restructure your design many times, so this is a big plus) much easier.
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu