Tuesday, February 7, 2012

Linux: Find Symbolic Links Full Paths

How to find all the symbolic links (soft links and hard links) along with their full paths on Linux/Unix command line.





find / -type l | xargs -r ls -l > symlinks.log




Explanation:



First, find all symbolically linked files/folders by using "-type l" which restricts the results to symbolic links. This example starts from the root of the server, but I would specify as much of the path as possible to speed up the search.



Second, list out all the details of each result by piping what was found to the "ls -l" command. To do this, "xargs" is used which passes the results from the first part as arguments to the "ls -l" command of the second part. The xargs option "-r" is used which tells it to not run the command if no results were returned.



Third, save the results from the second part to a file in the current directory named "symlinks.log".

Linux: Count Unique Error Log Entries

How to get the unique error log entries and a count of each entry on Linux/Unix command line. This is very useful for programmers and developers so they can look through website error log files and quickly fix any problems.

sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique

Explanation:

In error log files, generally speaking, most of the errors which repeat will differ by the IP and create date stored along side them. In my example, I am assuming these differing parts are stored in brackets which I remove so I can get unique values and a count of those unique values. How the error logs are stored can be modified in your Apache config file assuming you are using Apache.

First, open the error log file using "sed" which is a stream editor and replace all bracketed values with an empty string using a regular expression. The "^" carrot characters are where the find and replace values start and end. \[.*\] basically says match anything inside of brackets. This regular expression will remove the IP and create date as well as other values that aren't really needed leaving the actual error messages.

Second, sort the results. If you don't sort, unique won't work as it works by comparing consecutive lines.

Third, get all the unique values by comparing consecutive lines. Also show a count of those values by using the "-c" option. You can delete the option if you don't need a count.

Forth, save the results to a file in the current directory named "error_log-unique".

Customization:

If you find that the log file is too large to open, sort, and return a count of the unique lines, then try using the "tail" command to take a specific number of lines off the end of the file. Below I set the "tail" command to give me the last 100,000 lines of the file.

tail -100000 error_log | sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique


Here is another variation to only keep errors that contain the string "PHP ". By using the "-n" option in conjunction with the "p" argument, sed will not print out any thing except matching lines. In addition, I've added a sed command to remove the ", referer: ..." that can appear at the end of each line which can vary. By removing the referer, the error logs can be grouped together more precisely.

sed -n '/PHP /p' error_log | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > error_log-unique


Adding to the customization, I wanted to loop though all my log files ([filename]) and create unique log files for each ([filename]-unique). In the for loop, I look for all files in the current directory with a syntax of "*error_log". I perform the same sed command previously discussed except with the "$filename" variable set in the for loop. I use an if/else statement with the "-s" flag to test if the file exists and is not empty. If it is not empty, display the filename, else remove the file.

for filename in *error_log; do
 sed -n '/PHP /p' $filename | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > $filename-unique;
 if [ -s $filename-unique ]; then echo "$filename-unique"; else rm -f $filename-unique; fi;
done

Saturday, May 1, 2010

PHP: Convert Word Characters to UTF-8 HTML Numbered Entities

Overview



Have you ever tried to paste text from a Word document straight into a Web page and you get all these question marks "?", boxes or weird characters? What about when you store the same text to a database and you loss all those nice curly single quotes and curly double quotes? This will show you how to fix that and convert Microsoft Word ASCII characters [ISO-8859-1] to UTF-8 compliant HTML numbered entity codes (decimal).



The ISO 8859-1 Character Set



First you need to understand a few things about the ISO 8859-1 character set. There are 256 possible ranging from 0 to 255 and of these, some have not been assigned a value while others will display correctly on an ISO 8859-1 or UTF-8 Web page. You can display a character within PHP using the chr() function. Example chr(255). When displaying the characters in PHP, these characters repeat when greater then 255 such that chr(255) will be the same as chr(511) and chr(767).



So to jump straight to the point, the characters that need to be converted to a numbered entity range from 128 to 255, inclusively. There are probably a few of these characters that still display properly in Internet Explorer and not other browsers, but we will keep those characters included just to be thorough. Within this range, there are a few characters that need excluding which are 129, 141, 143, 144 and 157. They either are not assigned a value or the value doesn't display correctly; such would be the case with a backspace. Here is a nice visual ISO-8859-1 and Numbered Entities.



HTML Numbered Entities



Numbered entities are displayed on Web pages using the format of ÿ where the numerical value varies. The ISO 8859-1 character set range described above will match up with to the numbered entity equivalent so chr(255) is the same as ÿ. Keep in mind that &#255 is not the same as ǿ or ˿, so the desired range is restricted to that of the ISO-8859-1 range set in the above paragraph.



Converting the ISO-8859-1 Characters to HTML Numbered Entities



First a list will need to be made of all the characters that need converting. This can be accomplished best with a for loop and an array where the numbered entity is the key and the ASCII character is the value. Once that is done, you need to assign variables according to the desired conversion such as going from ASCII characters to numbered entities or vice-versa. A simple switch statement does the trick. Finally, you convert all the applicable characters within the string and return the converted value.




PHP Code:
function ConvertCharacters($String, $ConvertTo='entity') {
// Build character map list
$exclude = array(129, 141, 143, 144, 157);
for($i=128; $i<=255; $i++)
$characterMap['&#'.$i.';'] = chr($i);
foreach($exclude as $i)
unset($characterMap['&#'.$i.';']);

// Assign find and replace variables
switch($ConvertTo)
{
case 'ascii': // To ascii characters
$find = array_keys($characterMap);
$replace = array_values($characterMap);
break;
case 'entity': // To numbered entities
default:
$find = array_values($characterMap);
$replace = array_keys($characterMap);
break;
}

// Convert characters within string and return results
return str_replace($find, $replace, $String);
} // ConvertCharacters()



You can call the PHP function using one of three methods:
$String = 'string of text';
$ConvertedString = ConvertCharacters($String); // Convert to entity
$ConvertedString = ConvertCharacters($String, 'entity'); // Convert to entity
$ConvertedString = ConvertCharacters($String, 'ascii');// Convert to ascii


Resources



Saturday, August 1, 2009

Query For An IP In A Database

Overview



Below are several ways to query for an IP within a database and the shortcomings of each one. I'll assume that you have a PHP Web page where you would like to get the visitor's IP and look for that IP within your MySQL database. Also, I'll assume that you are able to query your database.



What Is An IP?



IP Addresses are 32-bit values which are typically represented as 8-bit quadruplets or "dotted quads" (IPv4). It consists of four unsigned integers separated by three dots. Each integer is less than or equal to 255. For example, one of the IPs that Google has registered would be 216.239.32.0 as seen on ARIN WHOIS



The format of an IP is [0-255].[0-255].[0-255].[0-255] and there are 4,294,967,296 (2^32) possible combinations (256*256*256*256), but some are reserved for private networks or multicast networks. Any user may use the reserved blocks within their network. For simplicity sake, any IP that is not in the table below should be considered external to your network.









Private Network IPs
IP address rangelargest CIDR block (subnet mask)
10.0.0.0 - 10.255.255.25510.0.0.0/8
172.16.0.0 - 172.31.255.255 172.16.0.0/12
192.168.0.0 - 192.168.255.255192.168.0.0/16


Getting A Visitor's IP Using PHP



If you don't already know how to get a visitor's IP, then here's the PHP code to do that. Someone else came up with the PHP code, so I can't take credit for it. Keep in mind that many computers within the visitor's internal network can share this external IP. Note, the IP could also be a computer from within your internal network connecting to the Web server which is also within your internal network given that your network configurations permit that computer to connect to the Web server using an internal IP. This type of configuration is common as it reduces the external traffic. See the Private Network IPs above to see if it is an internal IP.




PHP Code:
function getRealIpAddr() {
if(!empty($_SERVER['HTTP_CLIENT_IP'])) {
$ip=$_SERVER['HTTP_CLIENT_IP']; // share internet
} elseif(!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$ip=$_SERVER['HTTP_X_FORWARDED_FOR']; // pass from proxy
} else {
$ip=$_SERVER['REMOTE_ADDR'];
}
return $ip;
}
$ip = getRealIpAddr(); // Get the visitor's IP

$ip = $_SERVER['REMOTE_ADDR']; // Simple version


IP Lookup Using An IP Address



The simplest way to lookup an IP in a database is to store the IP as it is. If you are looking for "216.239.32.0", then store that IP in the database.




Database Table "ips":
ip [char(15)]
216.239.32.0 <-returned
216.239.32.1
216.239.32.255

SQL Statement:
SELECT * FROM ips WHERE ip = '216.239.32.0'
ORDER BY ip DESC
LIMIT 1;


Unfortunately, if you want to lookup multiple IPs, you will need a record for each IP. That could be a lot considering that there are a few billion IPs out there.



IP Lookup Using A Reverse LIKE On An IP Range



If you want to lookup a range of IPs, you can perform a reverse LIKE comparison where a "%" character (wildcard) is stored in the database as part of the IP representing a range. The following searches for an IP between 64.233.160.0 and 64.233.169.255 using the IP with the third octave of "16%". The wildcard, in this case, means that the remainder of the third octave and the entire forth octave can be anything.




Database Table "ips":
ip [char(15)]
216.239.32.%
70.89.39.1%
64.233.16% <-returned

SQL Statement:
SELECT * FROM ips
WHERE '64.233.160.0' LIKE ip
ORDER BY ip DESC
LIMIT 1;


Unfortunately, IPs aren't always registered in octaves and so one company may own a portion of the range while other companies may own the rest. Such is the case for one of Google's blocks. ARIN WHOIS shows that Google owns 64.124.112.[24 - 31], but Google doesn't own 64.124.112.[0 - 23] or 64.124.112.[32 - 255]. You would have to use "64.124.112.2%" and "64.124.112.3%" to better cover that IP range which still includes some unwanted IPs in the range.



What Is CIDR Notation?



A range of IPs can be written out using CIDR notation so it looks like IP/CIDR. I showed a few examples in the "Private Network IPs" table above. To understand CIDR notation, you first need to understand how a computer sees an IP which is in bits or a series of 1s and 0s (on and off bits). There are 32 bits total in an IP and if you divide that by four, you get four sets of eight or four octaves. The CIDR number basically says that all the IPs in the CIDR block share a certain number of bits from the left, so a CIDR of 24 means that the last 8 bits of the IP can be anything which is the entire last octave (8 bits). If you want to learn more, then read about CIDR notation.



To use CIDR notation in our database example, you need to get the "Subnet ID" of the IP range which you can quickly do using a CIDR Calculator.



Use the calculator and entier in an IP of 192.168.0.1 and set the "Mask Bits" to 24. The "Subnet ID" and "Mask Bits" are what need to be stored in the database which can be found in the "Net: CIDR Notation" field which are 192.168.0.0 and 24. Notice if you change the IP to 192.168.0.255, the "Subnet ID" doesn't change. That's because IPs 192.168.0.[0-255] belong to the 24 CIDR block. Remember how I kept referring to the last octave, well it is a series of eight bits where 0 is "00000000" and 255 is "11111111" in binary format. An equation to remember is 2^(32-m) where m = the "Mask Bits", so 2^(32-24) = 256 possible IPs or Hosts in this CIDR block. Check out the CIDR table in the CIDR link I provided above.



I seem to be getting too deep into this, so let's just assume you will be using the calculator and move on. The last thing you need to remember is that the CIDR number is an unsigned integer between 0 and 32 where 0 would be all IPs and 32 would be one IP. CIDR blocks of 32 and 31 aren't generally used on networks as those blocks only have one and two IPs respectively and defeat the purpose behind using a CIDR block. Also, the starting and ending IPs in a CIDR block cannot be used by a host as they are reserved for other reasons. However, these two IPs still belong to that CIDR block and so this doesn't change anything in our database lookup. We will use the 31 and 32 CIDR blocks in our database in those cases where we only need one or two IPs.



IP Lookup Using An IP Subnet ID And A CIDR Value



The key to using the CIDR notation in the database query is that you have to convert the IP (Subnet ID) to a base 10 number which is the counting system that we use in our every day lives. Also you have to calculate the last IP in the range using the CIDR value. There are two functions you can use in SQL to convert an IP to a base 10 number and back again. Those functions are inet_aton() and inet_ntoa() where "aton" means address to number and visa versa. In addition, the pow() function will be used to calculate the number of IPs in that CIDR block. Here's a quick breakup of the SQL equations:




Start IP = subnet
End IP = inet_ntoa(inet_aton(subnet) + pow(2,32-cidr) - 1)



Database Table "ips":
subnet [char(15)] cidr [tinyint(4)]
216.239.32.8 29
70.89.39.0 24
64.233.16.4 30 <-returned

SQL Statement:
SELECT * FROM ips
WHERE '64.233.16.6' BETWEEN
subnet AND INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)
ORDER BY subnet DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE '64.233.16.6' >= subnet
AND '64.233.16.6' <= INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)
ORDER BY subnet DESC
LIMIT 1;


How does this work? The SQL statement returned the record where 64.233.16.6 (visitor's IP) was between 64.233.16.4 and 64.233.16.7. Since the subnet and CIDR values were known, we could calculate the last IP in the CIDR block. The equation pow(2,32-cidr) gives the number of IPs in this CIDR block which in this case was 2^(32-30) or 4. Given that the subnet is the starting IP, we needed to go up three more IPs to get a total of four IPs which is why the "minus 1" is there. Converting from IP to a numeric value and back again makes this arithmetic easy.



Unfortunately, this method has the same kind of flaws that the previous "reverse LIKE" lookup method had. There may be cases where you want the results we obtained from the above query as well as a few more IPs above it such as 64.233.16.4 to 64.233.10. Since the total number of IPs in the CIDR block is calculated by 2 to the power of something, having 7 IPs would only work if you had multiple records in the database such as:



64.233.16.4/30 (4 IPs - 64.233.16.4 to 64.233.16.7)
64.233.16.8/31 (2 IPs - 64.233.16.8 to 64.233.16.9)
64.233.16.10/32 (1 IP - 64.233.16.10 to 64.233.16.10)


IP Lookup Using A Starting And Ending IP Address



Just get the visitor's IP, the starting IP, and the ending IP and convert everything to a base 10 integer and do your query.




Database Table "ips":
ip_start [char(15)] ip_end [char(15)]
216.239.32.8 216.239.32.15
70.89.39.0 70.89.39.255
64.233.16.4 64.233.16.10 <-returned

SQL Statement:
SELECT * FROM ips
WHERE INET_ATON('64.233.16.6') BETWEEN
INET_ATON(ip_start) AND INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC
LIMIT 1;


SQL Statement (Optimized):
SELECT * FROM ips
WHERE INET_ATON('64.233.16.6') >= INET_ATON(ip_start)
AND INET_ATON('64.233.16.6') <= INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC
LIMIT 1;


This seems to better cover the issues that the preceding methods had with IPs outside of a range or CIDR block. The only problem is that there are a lot of IP conversions being done. I think the query would be simpler if the IPs were converted to their numeric values before the lookup is done.



Converting A Visitor's IP To A Numeric Value Using PHP



To convert an IP to a numeric value in PHP, you need to use the ip2long() function with some extra code to convert it to an unsigned integer. I found two methods to doing this. Someone else came up with the PHP code, so I can't take credit for it.




PHP Code:
// Converts 64.233.16.6 to 1089015814
$ip_aton = sprintf("%u", ip2long($ip));
$ip_aton = (substr($ip,0,3)>127) ? ((ip2long($ip) & 0x7FFFFFFF) + 0x80000000) : ip2long($ip);


IP Lookup Using A Starting And Ending IP Numeric Value



Using the previous method, let's try converting the IPs before the lookup is done. To store the numeric value in the database, simply do an INSERT query using the inet_aton() function. Keep in mind that there are 4,294,967,296 possible IPs, so set the type to an unsigned INT which has a range of 0 to 4,294,967,295 according to the MySQL Reference Manual. Perfect for what we need. You can also store the readable IPs in the database for reference.




Database Table "ips":
ip_start_aton [int(10)] ip_end_aton [int(10)]
3639549960 3639549967
1180247808 1180248063
1089015812 1089015818 <-returned

SQL Statement:
SELECT * FROM ips
WHERE 1089015814 BETWEEN ip_start_aton AND ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC
LIMIT 1;


Doing it this way, you can have a starting IP of 0.0.0.0 and an ending IP of 255.255.255.255 covering all 4,294,967,296 IPs. This is overkill as networks aren't that large.



IP Lookup Using A Starting IP Numeric Value And A Range Number



Using the previous method, let's remove the oversized ending IP numeric value and replace it with a more convenient value like an "add range" value to calculate the ending IP. For my purposes, I am going to assume the IP ranges I am looking for won't be more then 255 IPs from the starting IP and so I will set the type to an unsigned TINYINT which has a range of 0 to 255 according to the MySQL Reference Manual. Feel free to change the type if you need a larger range. Remember that the range is added to the starting IP numeric value and so setting the range to 0 means that there is no range and so there is only one IP.




Database Table "ips":
ip_start_aton [int(10)] ip_add_range [tinyint(3)]
3639549960 7
1180247808 255
1089015812 6 <-returned

SQL Statement:
SELECT * FROM ips
WHERE 1089015814 BETWEEN
ip_start_aton AND ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC
LIMIT 1;



This seems to be the most optimized solution that covers all the scenarios mentioned above plus some.



Resources



Tuesday, July 28, 2009

Create Your Own Web Page

Overview



You need to understand that there is a language used to create a Web page so that your Browser (Firefox, Internet Explorer, Opera, etc.) can display the content in that Web page properly. That language is called HTML (Hypertext Markup Language) and it consists of a set of tags and rules which decide on how the content will be displayed. XML would be another more complicated language, but the majority of the people use HTML. A reformulated and better to use version of HTML is called XHTML (Extensible Hypertext Markup Language) which conforms to XML.



Setting Up A Website Folder



I'll assume you are using Windows since most of the population uses that. Create a folder on the Desktop of your computer (Right click desktop->New->Folder) and give it a name to represent your Website. If you cannot think of a name, then call it "website". Now open that folder and create a new file within that folder (Right click in folder->New->Text Document) and name it "index.html". Upon hitting "Enter", you may be prompted with a message about changing the file name. Click "Yes" to change it. Files with a ".html" or ".htm" extension are considered to be Web pages and you probably noticed that the file icon (if visible) changed to an "Internet Explorer" icon. Double-click the file and it should open up in your default browser which would be "Internet Explorer" for most people. Notice that the page is blank.



Displaying Content On Your Web Page



Content that displays on a Webpage can range from text to images to embedded video etc. We'll stick with the basics. Open the newly created "index.html" file in a text editor (Right click on file->Open With->Notepad). There is a better text editor I like to use called "Notepad++" which colorized the text depending on the language used.



In order to display something on that blank Web page that opened up in your browser, there are a few tags or elements that are needed that will give you the basic framework for the page. Type or copy and past the following HTML into the "index.html" file and save the file:




<html>
<head>
<title>Web Page Title</title>
</head>
<body>
Hello World
</body>
</html>


As you can see, there is a starting and ending "html" tag "<html>" and "</html>". This is a kind of declaration saying that anything between these two tags or within this element is considered to be "HTML". In addition, there are starting and ending "head" and "body" tags within the "html" element. For this example, you just need to remember that the "head" element will never be displayed on the Web page, only the "body" element will be displayed. In addition, there is a "title" tag within the "head" element which is displayed at the top of your browser window. If you open the Web page again (refresh or reload it if already open), you should see the title displayed at the top of the browser window. In addition, you should see "Hello World" displayed within the browser. Try and change the text within the title and body elements and refresh the Web page.



Creating Headers



Between the opening and closing "body" tags, let's change it so it looks like the following code:




<html>
<head>
<title>Web Page Title</title>
</head>
<body>
<h1>Header Size 1</h1>
<h2>Header Size 2</h2>
<h3>Header Size 3</h3>
<h4>Header Size 4</h4>
<h5>Header Size 5</h5>
<h6>Header Size 6</h6>
</body>
</html>


If you refresh the page in your browser, you will see a very large "Header Size 1" and a very small "Header Size 6". These are the header elements which go from 1 to 6 in size. If you have ever outlined a research paper for your writing course, these headers would represent all the points or bullets of that outline. You can repeat these elements as much as you like and you don't have to use all of them, but remember to use the opening and closing tags around the content such as "<h2>" and "</h2>". The forward slash indicates a closing tag.



Creating Paragraphs



To create a paragraph, we use the "<p>" and "</p>" paragraph tags. Make your code look like mine to see paragraphs:




<html>
<head>
<title>Topic Outline</title>
</head>
<body>
<h1>Main Topic</h1>
<p>The main topic paragraph would go here.</p>

<h2>Sub Topic 1</h2>
<p>The first sub-topic paragraph would go here.</p>

<h2>Sub Topic 2</h2>
<p>The second sub-topic paragraph would go here.</p>

</body>
</html>


Creating Bold, Italics, and Underline Text



To create bold, italics, or underline text, you would use one of the following tags. In addition, you can combine them to create bold-underline or italic-bold text etc. Just remember to close the last opened tag first such as "<b><i>content</i></b>". See below examples.




<b>Bold Text</b>
<strong>Strong Text</strong>
<i>Italics Text</i>
<em>Emphasis Text</em>
<u>Underline Text</u>
<b><u>Bold-Underline Text</u></b>
<i><b>Italic-Bold Text</b></i>


Let's update our current Web page code:




<html>
<head>
<title>Topic Outline</title>
</head>
<body>
<h1>Main Topic</h1>
<p>The <strong>main topic</strong> paragraph would go here.</p>

<h2><em><u>Sub Topic 1</u></em></h2>
<p>The first <strong>sub-topic</strong> paragraph would go here.</p>

<h2><em><u>Sub Topic 2</u></em></h2>
<p>The second <strong>sub-topic</strong> paragraph would go here.</p>

</body>
</html>


Resources



That should be enough of the basics. Here are some links for more information. Once you get past the basics, be sure to check our CSS (Cascading Style Sheets) for styling multiple Web pages so they look consistent and are easy to maintain.