C# Project – Scraping Files for Data

I was literally receiving thousands of emails in Outlook with regard to a specific network service error, many duplicates, that quickly surpassed my ability to copy, paste and manually process that I got fed up and looked for a more creative way to handle it.

At first I thought about creating a macro in Outlook to programmatically scrape the emails and save information they contained but macros are a security hazard and actually locked down by a network policy anyways. Alternatively I opted to make a utility in C#.

Creating a utility to access Office COM objects, namely Outlook, did nothing for me but try my patience and then I thought why not have the emails saved as text files to a folder as they come in. After explaining my intentions to the network admin, he set up a script on the mail server to save specific emails to a share as text files. Excellent.

So now with the emails being saved as plain text files, I could create a text scraper to pull out data I needed and process. However, the data I needed was sandwiched inside an error with text that ran contiguous (ex: errorfound:D2A1AB9C83=|0X0C0MID:3:3) so I had to process the files by searching for a string and grabbing the text between beginning and ending characters. What I needed was simply in between the colon and the pipe characters which were always static in what they contained. Below is the parser function and button function code where I added to save the list to a file also.

//String parser
public string ParseBetween(string Subject, string Start, string End)
{
        return Regex.Match(Subject, Regex.Replace(Start, @"[][{}()*+?.\\^$|]", @"\$0") + @"\s*(((?!" + Regex.Replace(Start, @"[][{}()*+?.\\^$|]", @"\$0") + @"|" + Regex.Replace(End, @"[][{}()*+?.\\^$|]", @"\$0") + @").)+)\s*" + Regex.Replace(End, @"[][{}()*+?.\\^$|]", @"\$0"), RegexOptions.IgnoreCase).Value.Replace(Start, "").Replace(End, "");
}

//Parse between two strings and grab that contents as new string
private void button1_Click(object sender, EventArgs e)
{
    textBox1.Clear();
    tsNotify.Text = "";
        StringBuilder strFile = new StringBuilder();
        string s2 = "errorfound:";  //beginning string
        string s3 = "|0X0C0MID:3:3";    //end string
        //files to parse
        foreach (string file in Directory.EnumerateFiles(@"\\server\SomeErrors\", "*.txt"))
        {
            string contents = File.ReadAllText(file);
            string strParsed = ParseBetween(contents, s2, s3);
            //Clean up the string
            string clean = Regex.Replace(strParsed, "[^A-Za-z0-9 ]", "");
            textBox1.AppendText(clean + "\r\n");
        }
        using (StreamWriter objWriter = new StreamWriter(@"C:\ServerErrors.txt"))
        {
            objWriter.Write(textBox1.Text);
            objWriter.Flush();
            tsNotify.Text = "Parsed and saved to C:\\ServerErrors.txt";
        }
}

As the data got pulled from the text files, I experienced some string anomalies such blank lines (sometimes several in a row) and white spaces because not all of the emails with the title being saved related to the error so those emails made it to the list as blank entries. Trimming the strings helped with that.

were showing up as blank entries in the list.

private void button5_Click(object sender, EventArgs e)
{
    string filePath = "C:\\ServerErrors.txt";
    //remove any empty lines
    string[] lines = File.ReadAllLines(filePath).Where(s => s.Trim() != string.Empty).ToArray();
    listBox1.Items.AddRange(lines);
    tsNotify.Text = "Errors added for processing";
}

Once processed, I removed any duplicate data using the code below:

private void button6_Click(object sender, EventArgs e)
{
    string[] arr = new string[listBox1.Items.Count];
    listBox1.Items.CopyTo(arr, 0);
    var arr2 = arr.Distinct();
    listBox1.Items.Clear();
    foreach (string s in arr2)
    {
        string clean = Regex.Replace(s, "[^A-Za-z0-9 ]", "");
        listBox1.Items.Add(clean);
    }
    tsNotify.Text = "Duplicates removed";
}

Once finished, I processed a final list of errors that were minus white spaces, blanks lines and duplicates.

private void button7_Click(object sender, EventArgs e)
{
    foreach (object liItem in listBox1.Items)
        textBox2.Text += liItem.ToString() + "\r\n";
    tsNotify.Text = "Final list ready to copy";
}

After a few iterations I was successful at scraping and processing the data I needed and since each time I wanted to process only newer text files, at the end of each run, I would delete the files on the server share.

private void button2_Click(object sender, EventArgs e)
{
    System.IO.DirectoryInfo di = new DirectoryInfo(@"\\server\SomeErrors\");
    foreach (FileInfo file in di.GetFiles())
    {
        file.Delete();
    }
    tsNotify.Text = "Remote files at \\server\\SomeErrors deleted";
    textBox1.Text = "";
    listBox1.Items.Clear();
}

As with all of my projects, it is out of necessity and not code pretty in any way. Its functional for my needs and serves it purpose though. Code in my project is freely found around the internet by performing simple Google searches or hitting Microsoft’s programming help sites. If the code benefits anyone then awesome. I take credit for nothing more than the tool I have created to accomplish a task.

C# Programming out of need – Getting IP and MAC Addresses

getipmac-ipThis is my C Sharp project for getting IP and MAC addresses. Code kudos go out to respective developers and websites, such as MSDN, stackoverflow, C# Corner and others I have left out but all code is in the public domain and modified by me to fit my needs.

As far as I am concerned my projects are as-is and there is nothing code efficient in my projects so please don’t beat me up too bad over any of it.

I am not a professional programmer in any regard but do consider myself a coder, this is just my slapped together get ‘er done tool befitting my personal need. If you want to comment on my project offline send me an email at stevegossett (AT) outlook.com

NOTE
You will need .NET 4.5 installed for this C# project and you will need to right-click on the
project name from within the IDE and add references to:

  • System.Management
  • System.Management.Instrumentation


Download Project

 

Encode Decode URL Tool

URLEncodeDecode

I was testing something that in turn generated encoded URL strings so I wanted a quick way to decode them into their respective URL format along with properly formatted characters. As an example, instead of seeing a %2F in the URL I wanted to see the actual forward slash ‘/’ that it represented. In C Sharp I recall the HttpUtility.UrlEncode and HttpUtility.UrlDecode so made this app to do what I needed.

Example, take this URL:

http%3a%2f%2fwww.mysite.com%2fgohere%2fmyresources%2fmyarticles%2fhow-to-build-a-better-mousetrap%2f%3futm_blahblah%3dmyarticles%26utm_medium%3dsomethingelse

Generate this URL:
http://www.mysite.com/gohere/myresources/myarticles/how-to-build-a-better-mousetrap/?utm_blahblah=myarticles

I hope someone else finds it as useful a tool. The .cs file is also included in the ZIP file below.

Programming out of need – HOSTS File Editor

I work within the realm of QA testing and have to constantly edit my Windows hosts file to redirect to other servers during the course of an average day and although I had a few batch files to run I wished I had a tool that I could use to do it quicker. I looked around the net and didn’t find anything useful, surprisingly.

I thought about it for a few weeks and then just decided to dive in and take a stab at it so at home on my home PC, because I don’t want any conflicts doing this on a work machine, I started learning C# in Visual Studio. All I was armed with really was a plain text file with redirects that I could copy and paste into the hosts file depending on what I needed to accomplish test wise and a couple batch files I used to previously do it.

Example:

##=========WORDPRESS========
## redirect to wp0 on domain
#111.111.111.111 www.domain.com secure.domain.com

##===========VARN===========
## redirect to v0 on domain
#111.111.111.111 v0.ourdomain.com

## redirect to vs on domain
#111.111.111.111 www.domain.com secure.domain.com

##======== TEST SERVER ========
## redirect to test on domain
#111.111.11.11 test.domain.com

At first I thought of doing an app where I could select each server from a combobox and then launch Notepad as Admin to copy and then save the info, then it was a combobox with form button events and lengthy if else if statement to write the server info to the hosts file. It was certainly worse code for wear for sure.

I took a step back from that to reanalyze my need and through several iterations came up with a host file manager that worked for me. In a nutshell it will read in my hosts data file (hostsdata.txt) into a checkedbox component, allowing me to check what I want and then write those items to my systems hosts file.

I made some additions to the tool such as starting with elevated admin privileges which are needed in order to write to the hosts file, the ability to reset the hosts file to default commented text, a way to perform functions similar to ipconfig /release, /renew and /flushdns using WMI and added the small editor t
o the tool to allow me to edit the hostsdata.txt file for new or changed redirect information.

Main Application

Main Application

As you can see in the screenshots, my data file entries have the hash tags which the hosts file recognizes as commenting. The procedure / block of code for writing the information to the hosts file will strip that forward hash tag (the #) so that it is written to file as un-commented already.

However, as you can tell in the screenshot of the main application, there are some checkboxes that do not contain a display value and I haven’t added any code for processing that so I just catch the exception to get on with my day – I just don’t click on the blank entries.

In any regard, I am not a professional programmer and essentially just dove in on this because I had a need and hated the way I currently had to deal with modifying the hosts file. I didn’t find any tools online so I created this app to use. I still tweak and refine this tool a little but take it as a tool that was just something whipped up to get ‘er done. Any comments are welcomed here or to my email or if you have trouble downloading the file for some reason – email at stevegossett[[at]]outlook.com.

It requires at a minimum .NET Framework version 4.5 to be installed. If you get an error when trying to write changes to the hosts file this will be why.

Host Editor Tool Download

So in contrast to my hosts editor app, below is the contents of a batch file to select from a list of IPs. I was using this before creating the hosts manager tool. Every once in awhile I still use this but only when I am on a system that barks at starting apps from a thumbdrive. Don’t forget to right-click and Run as…. Admin.

@echo off
TITLE Modifying your HOSTS file
COLOR F0
ECHO.
:LOOP
SET Choice=
ECHO a = server1
ECHO b = server3
ECHO c = server4
ECHO d = stage
ECHO e = devserver
SET /P Choice="Point HOSTS to? Enter number or R to reset. (0-7,a-e,R)"
IF NOT '%Choice%'=='' SET Choice=%Choice:~0,1%</code>

ECHO.
IF /I '%Choice%'=='0' GOTO 0
IF /I '%Choice%'=='1' GOTO 1
IF /I '%Choice%'=='2' GOTO 2
IF /I '%Choice%'=='3' GOTO 3
IF /I '%Choice%'=='5' GOTO 5
IF /I '%Choice%'=='7' GOTO 7
IF /I '%Choice%'=='a' GOTO 8
IF /I '%Choice%'=='b' GOTO 9
IF /I '%Choice%'=='c' GOTO 10
IF /I '%Choice%'=='d' GOTO 11
IF /I '%Choice%'=='e' GOTO 12
IF /I '%Choice%'=='R' GOTO RESET
ECHO.
GOTO Loop

:RESET
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO #Empty hosts file&gt;&gt;%hosts%
GOTO END

:0
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:1
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:2
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:3
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:5
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO 222.222.22.22 dbcluster3&gt;&gt;%hosts%
ECHO Finished
GOTO END

:7
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 serv.domain.com www.domain.com domain.com secure.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:8
REM server1.domain.com
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 domain.com www.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:9
REM server3.domain.com
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 domain.com www.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:10
REM sever4.domain.com
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 domain.com www.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:11
REM stage.domain.com
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 domain.com www.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END

:12
REM devserver.domain.com
set hosts=%windir%system32driversetchosts
If exist %hosts% (
del /q %hosts%)
ECHO Carrying out requested modifications to your HOSTS file
ECHO 111.111.11.11 domain.com www.domain.com devserver.domain.com&gt;&gt;%hosts%
ECHO Finished
GOTO END
:END
ECHO.
EXIT