C++ to ASM linkage in GCC: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
m (Prefer writing assembly instead of ASM and assembly files are not "scripts")
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(2 intermediate revisions by 2 users not shown)
Line 9:
GCC follows the [http://www.codesourcery.com/public/cxx-abi/abi.html Itanium C++ ABI]. The link preceding this text is of course a link to a public collection of documents on the Itanium C++ ABI. One of the things which prevents generally absolute portability of C++ libraries is the fact that different compilers use different ''Name Mangling'' schemes.
 
'''=== Why Mangle Symbol Names?''' ===
 
In C++, the symbols you define are generally not exclusive. For example, the function, <tt>getObjId()</tt> in C would simply be encoded as getObjId in the object file output. But in C++, since this function may be overloaded, it needs to contain extra information about itself in the name so that when an argument (function signature is the correct term) match is to be made, the linker or compiler may know which function is to be linked to.
 
Line 32 ⟶ 33:
Variables in C++, and C, and many other languages have several ''scope'' levels. These are the levels of visibility of a symbol in one object file to the code in another object file. Generally, within one object file, unless you take steps to ensure otherwise, all static local and global symbols are available to the code in that file.
 
'''=== Global Symbols''' ===
Global symbols are those which are seen universally by all object code in the entire program, at the linking phase. Technically, during execution, the whole program is just bytes, so actually, every symbol is just an address, so every function has access to every symbol. But it is useful to programmers to be able to ''abstract'' access to symbols.
 
Line 39 ⟶ 40:
In assembly, (NASM, specifically; the author does not use GAS. Anyone who knows how GAS works may add to this article as they see fit.) all symbols are automatically local to the particular assembly file in which they appear. To make a symbol global, you must use the 'global _SYMBOL_NAME_' directive. If I remember right, in GAS, that's '.globl _SYMBOL_NAME'.
 
<sourcesyntaxhighlight lang="ASM">
;-------------------------------------
; I usually like to place all my directives in a section above the code:
Line 60 ⟶ 61:
ret
 
</syntaxhighlight>
</source>
 
To clarify, technically, the linker can 'see' all symbols. It just chooses to ignore linkage between files for symbols not exclusively declared global. If you were to write a second file:
 
<sourcesyntaxhighlight lang="ASM">
;--------------------------------------
; Extern directives here
Line 72 ⟶ 73:
extern mylocalsymbol
;--------------------------------------
</syntaxhighlight>
</source>
 
This file will assemble, but when passed through the linker, you will receive a message to the effect that no such symbol has been defined for 'mylocalsymbol'.
 
'''=== Local Symbols''' ===
 
'''Local Symbols'''
These are either Function local (on the stack, and therefore definitely not available to any other code since its persistence is unpredictable. It can be popped off at any time.) or File local (like mylocalsymbol above).
 
In C/C++, we know that a static variable defined outside of a function is file local. It may not be linked to outside of that particular file.
 
 
So now we get back to the main question presented at the top of this section of the article: What is external linkage? External linkage is the linking to a global symbol which is not defined in the same file scope as the file you're working in. The linker will therefore place the referred variable's address where it is referenced in the referencing file.
 
 
== Linking to External Symbols in C++ ==
Line 92 ⟶ 90:
Technically, you can link to any kind of symbol using the "C" linkage style. I'll explain that now.
 
'''=== "C" Linkage''' ===
 
Well, let's think this through. In C, all symbols are simply self representing. If I declare a function, getObjId, it will be called 'getObjId' in the output file. C uses no name mangling since you cannot have different symbols with the same name occurring in the same namespace in C.
Line 114 ⟶ 112:
So linking to 'extern "C"' symbols is the same as telling the compiler to just trust you, and place references to XYZ symbol name as is, even though it may never see the definition of that symbol within the ''current file''. The symbol is defined elsewhere. It may even be defined in a shared library, and be expected to be linked in by the OS at runtime (Dynamic Linking), although this usually requires a little more, and is usually handled by the compiler and linker as set up by the host OS (this is where OS libraries come in).
 
'''=== "C++" Linkage '''===
 
Surprised? Yes, it does exist. However, a C++ compiler uses this linkage style by default. If you explicitly state it, it won't actually change anything. I know for a fact that GCC/G++ will not complain if you tell it to link to a symbol in C++ style.
Line 122 ⟶ 120:
If you do the following:
 
<sourcesyntaxhighlight lang="cpp">
class foo {
public:
Line 130 ⟶ 128:
extern foo fubar;
extern "C++" foo fubar
</syntaxhighlight>
</source>
 
The compiler will take them both to mean the same thing: 'Link to an external symbol which has the ''demangled'' name fubar, and is of type foo'.
Line 141 ⟶ 139:
 
== C++ - Assembly linkage ==
 
 
I'm sorry I took this long to get to this part of the article, but the facts given above are pertinent.
 
To link to a C/C++ symbol from an assembly file, you simply tell the Assembler that the symbol is external. Assemblers ''always'' insert references as they see them, so they are always "C" style, if you please. The assembler takes you at your word. This is why you can look for the mangled symbol name of a function, or variable, and then actually place the mangled name into your assembly file (extern _ZN3foo3barEv) and it will work (depending on whether or not a 'this' pointer is involved).
 
 
To link to an assembly symbol from C/C++, you must know what the absolute symbol name in the assembly file is, and ensure it's global, then link to the absolute symbol name "C" style. (extern "C" SYMBOL_NAME).
 
The 'This' pointer issue is another thing altogether, and is actually a very serious consideration you should take into account when designing your kernel, or choosing whether or not to use C++ altogether. C makes library generation, and linking easier. C++ makes design and re-structuring (you will restructure your design many times, so this is a big plus) much easier.
 
 
== See Also ==
Line 157 ⟶ 152:
 
[[Category:C++]]
[[Category:Assembly]]